Cheat Mastermind and Explainable AI

How a child’s puzzle game gives insight into more human-like explanations of AI decisions

Many of you will have played Mastermind, the simple board game with coloured pegs where you have to guess a hidden pattern.  At each turn the person with the hidden pattern scores the challenge until the challenger finds the exact colours and arrangement.

As a child I imagined a variant, “Cheat Mastermind” where the hider was allowed to change the hidden pegs mid-game so long as the new arrangement is consistent with all the scores given so far.

This variant gives the hider a more strategic role, but also changes the mathematical nature of the game.  In particular, if the hider is good at their job, it makes it a worst case for the challenger if they adopt a minimax strategy.

More recently, as part of the TANGO project on hybrid human-AI decision making, we realised that the game can be used to illustrate a key requirement for explainable AI (XAI).  Nick Chater and Simon Myers at Warwick have been looking at theories of human-to-human explanations and highlighted the importance of coherence, the need for consistency between explanations we give for a decision now and future decisions.  If I explain a food choice by saying “I prefer sausages to poultry“, you would expect me to subsequently choose sausages if given a choice.

Cheat Mastermind captures this need to make our present decisions consistent with those in the past.  Of course in the simplified world of puzzles this is a perfect match, but in real world decisions things are more complex.  Our explanations are often ‘local’ in the sense they are about a decision in a particular context, but still, if future decisions disagree wit earlier explanations, we need to be able to give a reason for the exception: “turkey dinners at Christmas are traditional“.

Machine learning systems and AI offer various forms of explanation for their decisions or classifications.  In some cases it may be a nearby example from training data, in some cases a heat map of areas of an image that were most important in making a classification, or in others an explicit rule that applies locally (in the sense of ‘nearly the same data).  The way these are framed initially is very formal, although they may be expressed in more humanly understandable visualisations.

Crucially, because these start in the computer, most can be checked or even executed (in the case of rules) by the computer.  This offers several possible strategies for ensuring future consistency or at least dealing with inconsistency … all very like human ones.

  1. highlight inconsistency with previous explanations: “I know I said X before, but this is a different kind of situation”
  2. explain inconsistency with previous explanations: “I know I said X before, but this is different because of Y”
  3. constrain consistency with previous explanations by adding the previous explanation “X” as a constraint when making future decisions. This may only be possible with some kinds of machine learning algorithms.
  4. ensure consistency by using the previous explanation “X” as the decision rule when the current situation is sufficiently close; that is completely bypass the original AI system.

The last mimics a crucial aspect of human reasoning: by being forced to reflect on our unconscious (type 1) decisions, we create explicit understanding and then may use this in more conscious rational (type 2) decision making in the future.

Of course, strategy 3 is precisely Cheat Mastermind.

 

 

 

Clippy returns!

Helpful suggestions aren’t helpful if they block what you are doing. You would think Microsoft would have learned that lesson with Clippy.

For those who don’t remember Clippy, it was an early AI agent incorporated into Office products.  If you were in Word and started to type “Dear Sam”, Clippy would pop up and say “it looks like you are writing a letter” and offered potentially helpful suggestions.  The problem was that Clippy was a modal dialog, that is, while it was showing you couldn’t type.  So of you were in the middle of typing “Dear Sam, Thank you for your letter …”, everything after the point Clippy appeared would be lost.  This violates a critical rule of appropriate intelligence, while Clippy did “good things when it was right”, it did not avoid doing “bad things when it wasn’t” 🙁

Not surprisingly, Clippy was withdrawn many years ago.

However, now in Outlook (web version) shades of Clippy return.  If you make a typo or spelling mistake, it is marked with an underline like this.

This is a trivial typo a semi-colon instead of an apostrophe in “can’t”.  So I go to correct it by clicking just after the semi-colon and then type delete followed by apostrophe.  However, the text does not change!  This is because the spelling checker has ‘helpfully’ popped up a dialog box with spelling suggestions …

… but the dialog is modal!  So, what I type is simply thrown away.  In this case it is possible to select the correct spelling, but it only after it has interrupted my flow of editing.  If no suggestion is correct one has to either click somewhere else in the message, or click  the’stop’ icon on the bottom left of the box to make the box go away (with slightly different meanings), and then continue to type what you were trying to type in the first place.

Design takeaway:  Be very cautious when using modal dialog boxes, especially when they may appear unexpectedly.

Another year – running and walking, changing roles and new books

Yesterday I completed the Tiree Ultramarathon, I think my sixth since they began in 2014. As always a wonderful day and a little easier than last year. This is always a high spot in the year for me, and also corresponds to the academic year change, so a good point to reflect on the year past and year ahead.  Lots of things including changing job role, books published and in preparation, conferences coming to Wales … and another short walk …

Tiree Ultra and Tech Wave

Next week there will be a Tiree Tech Wave, the first since Covid struck. Really exciting to be doing this again, with a big group coming from Edinburgh University, who are particularly interested in co-design with communities.

Aside: I nearly wrote “the first post-Covid Tiree Tech Wave”, but I am very aware that for many the impact of Covid is not past: those with long Covid, immunocompromised people who are in almost as much risk now as at the peak of the pandemic, and patients in hospital where Covid adds considerably to mortality.

Albrecht Schmidt from Ludwig-Maximilians-Universität München was here again for the Ultra. He’s been several times after first coming the year of 40 mile an hour winds and rain all day … he is built of stern stuff.  Happily, yesterday was a little more mixed, wind and driving rain in the morning and glorious sunshine from noon onwards … a typical Tiree day 😊

We have hatched a plan to have Tiree Tech Wave next year immediately after the Ultra. There are a number of people in the CHI research community interested in technology related to outdoors, exercise and well-being, so hoping to have that as a theme and perhaps attract a few of the CHI folk to the Ultra too.

Changing roles

My job roles have changed over the summer.

I’ve further reduced my hours as Director of the Computational Foundry to 50%. University reorganisation at Swansea over the last couple of years has created a School of Mathematics and Computer Science, which means that some of my activities helping to foster research collaboration between CS and Maths falls more within the School role. So, this seemed a good point to scale back and focus more on cross-University digital themes.

However, I will not be idle! I’ve also started a new PT role as Professorial Fellow at Cardiff Metropolitan University. I have been a visiting professor at the Cardiff School of Art and Design for nearly 10 years, so this is partly building on many of the existing contacts I have there. However, my new role is cross-university, seeking to encourage and grow research across all subject areas. I’ve always struggled to fit within traditional disciplinary boundaries, so very much looking forward to this.

Books and writing

This summer has also seen the publication of “TouchIT: Understanding Design in a Physical-Digital World“. Steve, Devina, Jo and I first conceived this when we were working together on the DePTH project, which ran from 2007 to 2009 as part of the AHRC/EPSRC funded Designing for the 21st Century Initiative. The first parts were written in 2008 and 2009 during my sabbatical year when I first moved to Tiree and Steve was our first visitor. But then busyness of life took over until another spurt in 2017 and then much finishing off and updating. However now it is at long last in print!

Hopefully not so long in the process, three more books are due to be published in this coming year, all around an AI theme. The first is a second edition of the “Introduction to Artificial Intelligence” textbook that Janet Finlay and I wrote way back in 1996. This has stayed in print and even been translated into Japanese. For many years the fundamentals of AI only changed slowly – the long ‘AI winter’. However, over recent years things have changed rapidly, not least driven by massive increases in computational capacity and availability of data; so it seemed like a suitable time to revisit this. Janet’s world is now all about dogs, so I’ve taken up the baton. Writing the new chapters has been easy. The editing making this flow as a single volume has been far more challenging, but after a focused writing week in August, it feels as though I’ve broken the back of it.

In addition, there are two smaller volumes in preparation as part of the Routledge and CRC AI for Everything series. One is with Clara Crivellaro on “AI for Social Justice“, the other a sole-authored “AI for Human–Computer Interaction”.

All of these were promised in 2020 early in the first Covid lockdown, when I was (rather guiltily) finding the time tremendously productive. However, when the patterns of meetings started to return to normal (albeit via Zoom), things slowed down somewhat … but now I think (hope!) all on track 😊

Welcoming you to Wales

In 2023 I’m chairing and co-chairing two conferences in Swansea. In June, ACM Engineering Interactive Computer Systems (EICS 2023) and in September the European Conference on Cognitive Ergonomics (web site to come, but here is ECCE 2022). We also plan to have a Techwave Cymru in March. So I’m looking forward to seeing lots of people in Wales.

As part of the preparation to EICS I’m planning to do a series of regular blog posts on more technical aspects of user interface development … watch this space …

Alan’s on the road again

Nearly ten years ago, in 2013, I walked around Wales, a personal journey and research expedition. I always assumed I would do ‘something else’, but time and life took over. Now, the tenth anniversary is upon me and it feels time do something to mark it.

I’ve always meant to edit the day-by-day blogs into a book, but that certainly won’t happen next year. I will do some work on the dataset of biodata, GPS, text and images that has been used in a few projects and is still a unique data set, including, I believe, still the largest single ECG trace in the public domain.

However, I will do ‘something else’.

When walking around the land and ocean boundaries of Wales, I was always aware that while in some sense this ‘encompassed’ the country, it was also the edge, the outside. To be a walker is to be a voyeur, catching glimpses, but never part of what you see.  I started then to think of a different journey, to the heart of Wales, which for me, being born and brought up in Cardiff, is the coal valleys stretching northwards and outwards. The images of coal blackened miners faces and the white crosses on the green hillside after Aberfan are etched into my own conception of Wales.

So, there will be an expedition, or rather as series of expeditions, walking up and down the valleys, meeting communities, businesses, schools and individuals.

Do you know places or people I should meet?

Do you want to join me to show me places you know or to explore new places?

Sampling Bias – a tale of three Covid news stories

If you spend all your time with elephants, you might think that all animals are huge. In any experiment, survey or study, the results we see depend critically on the choice of people or things we consider or measure.

Three recent Covid-19 news stories show the serious (and in one case less serious) impact of sampling bias, potentially creating misleading or invalid results.

  

  • Story 1 – 99.9% of deaths are unvaccinated – An ONS report in mid-September was widely misinterpreted and led to the mistaken impression that virtually all UK deaths were amongst those who were unvaccinated.  This is not true: whilst vaccination has massively reduced deaths and serious illness, Covid-19 is still a serious illness even for those who are fully jabbed.
  • Story 2 – Lateral flow tests work – They do! False positives are known to be rare (if it says you’ve got it you probably have), but data appears to suggest that false negatives (you get a negative result, but actually have Covid) are much higher.  Researchers at UCL argue that this is due to a form of sampling bias and attempt to work out the true figure … although in the process they slightly overshoot the mark!
  • Story 3 – Leos get their jabs – Analysis of vaccination data in Utah found that those with a Leo star sign were more than twice as likely to be vaccinated than Libras or Scorpios.  While I’d like to believe that Leos are innately more generous of spirit, does your star sign really influence your likelihood of getting a jab?

In the last story we also get a bit of confirmation bias and the  file-drawer effect to add to the sampling bias theme!

Let’s look at each story in more detail.

Story 1 – 99.9% of deaths are unvaccinated

I became aware of the first story when a politician on the radio said that 99.9% of deaths in the UK were of unvaccinated people.  This was said I think partly to encourage vaccination and partly to justify not requiring tougher prevention measures.

The figure surprised me for two reasons:

  1. I was sure I’d seen figures suggesting that there were still a substantial number of ‘breakthrough infections’ and deaths, even though the vaccinations were on average reducing severity.
  2. As a rule of thumb, whenever you hear anything like “99% of people …” or “99.9% of times …”, then 99% of the time (sic) the person just means “a lot”.

Checking online newspapers when I got home I found the story that had broken that morning (13th Sept 2021) based on a report by the Office of National Statistics, “Deaths involving COVID-19 by vaccination status, England: deaths occurring between 2 January and 2 July 2021“.  The first summary finding reads:

In England, between 2 January and 2 July 2021, there were 51,281 deaths involving coronavirus (COVID-19); 640 occurred in people who were fully vaccinated, which includes people who had been infected before they were vaccinated.

Now 640 fully vaccinated deaths out of 51,281 is a small proportion leading to newspaper headlines and reports such as “Fully vaccinated people account for 1.2% of England’s Covid-19 deaths” (Guardian) or “Around 99pc of victims had not had two doses” (Telegraph).

In fact in this case the 99% figure does reflect the approximate value from the data, the politician had simply added an extra point nine for good measure!

So, ignoring a little hyperbole, at first glance it does appear that nearly all deaths are of unvaccinated people, which then suggests that Covid is pretty much a done deal and those who are fully vaccinated need not worry anymore.  What could be wrong with that?

The clue is in the title of the report “between 2 January and 2 July 2021“.  The start of this period includes the second wave of Covid in the UK.  Critically while the first few people who received the Pfizer vaccine around Christmas-time were given a second dose 14 days later, vaccination policy quickly changed to leave several months between first and second vaccine doses. The vast majority of deaths due to Covid during this period happened before mid-February, at which point fewer than half a million people had received second doses.

That is, there were very few deaths amongst the fully vaccinated, in large part because there were very few people doubly vaccinated.  Imagine the equivalent report for January to July 2020, of 50 thousand deaths there would have been none at all of the fully vaccinated.

This is a classic example of sampling bias, the sample during the times of peak infection was heavily biased towards the unvaccinated, making it appear that the ongoing risk for the vaccinated was near zero.

The ONS report does make the full data available.  By the end of the period the number who were fully vaccinated had grown to over 20 million. The second wave had long passed and both the Euros and England’s ‘Freedom Day’ had not yet triggered rises in cases. Looking below, we can see the last five weeks of the data, zooming into the relevant parts of the ONS spreadsheet.

Notice that the numbers of deaths amongst the fully vaccinated (27, 29, 29, 48, 63) are between one-and-a-half and twice as high as those amongst the unvaccinated (18, 20, 13, 26, 35 ).  Note that this is not because the vaccine is not working; by this point the vaccinated population is around twice as high as the unvaccinated (20 million to 10 million). Also, as vaccines were rolled out first to the most vulnerable, these are not comparing similar populations (more sampling bias!).

The ONS do their best to correct for the latter sampling bias and the column (slightly confusingly) labelled “Rate per 100,000 population“, uses the different demographics to estimate the death rate if everyone were in that vaccination bracket. That is, in the week ending 2nd July (last line of the table) if everyone were unvaccinated one would expect 1.6 deaths per 100,000 whereas if everyone were vaccinated, we would expect 0.2 deaths per 100,000.

It is this (buried and complex) figure which is actually the real headline – vaccination is making a ten-fold improvement.  (This is consonant with more recent data suggesting a ten-fold improvement for most groups and a lower, but still substantial four-fold improvement for the over-80s.)  However, most media picked up the easier to express – but totally misleading – total numbers of deaths figures, leading to the misapprehension amongst some that it is “all over”.

To be fair the ONS report includes the caveat:

Vaccinations were being offered according to priority groups set out by the JCVI, therefore the characteristics of the vaccinated and unvaccinated populations are changing over time, which limits the usefulness of comparing counts between the groups.

However, it is somewhat buried and the executive summary does not emphasise the predictably misleading nature of the headline figures.

Take-aways:

  • for Covid – Vaccination does make things a lot better, but the rate of death and serious illness is still significant
  • for statistics – Even if you understand or have corrected for sampling bias or other statistical anomalies, think about how your results may be (mis)interpreted by others

Story 2 – Lateral flow tests work

Lateral flow tests are the quick-and-dirty weapon in the anti-Covid armoury  They can be applied instantly, even at home; in  comparison the ‘gold standard’ PCR test can take several days to return.

The ‘accuracy’ of lateral flow tests can be assessed by comparing with PCR tests.  I’ve put ‘accuracy’ in scare quotes as there are multiple formal measures.

A test can fail in two ways:

  • False Positive – the test says you have Covid, but you haven’t.  – These are believed to be quite rare, partly because the tests are tuned not to give false alarms too often, especially when prevalence is low.
  • False Negative – the test says you don’t have Covid, but you really do. – There is a trade-off in all tests: by calibrating the test not to give too many false alarms, this means that inevitably there will be times when you actually have the disease, but test negative on a lateral flow test.  Data comparing lateral flow with PCR suggests that if you have Covid-19, there is still about a 50:50 chance that the test will be negative.

Note that the main purpose of the lateral flow test is to reduce the transmission of the virus in the population.  If it catches only a fraction of cases this is enough to cut the R number. However, if there were too many false positive results this could lead to large numbers of people needlessly self-isolating and potentially putting additional load on the health service as they verify the Covid status of people who are clear.

So the apparent high chance of false negatives doesn’t actually matter so much except insofar as it may give people a false sense of security.  However, researchers at University College London took another look at the data and argue that the lateral flow tests might actually be better than first thought.

In a paper describing their analysis, they note that a person goes through several stages during the illness; critically, you may test positive on a PCR if:

  1. You actively have the illness and are potentially infectious (called D2 in the paper).
  2. You have recently had the illness and still have a remnant of the virus in your system, but are no longer infectious (called D3 in the paper).

The virus remnants detected during the latter of these (D3) would not trigger a lateral flow test and so people tested with both during this period would appear to be a false negative, but in fact the lateral flow test would accurately predict that they are not infectious. While the PCR test is treated as ‘gold standard’, the crucial issue is whether someone has Covid and is infectious – effectively PCR tests give false positives for a period after the disease has run its course.

The impact of this is that the accuracy of lateral flow tests (in terms of the number of false negatives), may be better than previously estimated, because this second period effectively pollutes the results. There was a systematic sampling bias in the original estimates.

The UCL researchers attempt to correct the bias by using the relative proportion of positive PCR tests in the two stages D2/(D2+D3); they call this ratio π (not sure why).  They use a figure of 0.5 for this (50:50 D2:D3) and use it to estimate that the true positive rate (specificity) for lateral flow tests is about 80%, rather than 40%, and correspondingly the false negative rate only about 20%, rather than 60%.  If this is right, then this is very good news: if you are infectious with Covid-19, then there is an 80% chance that lateral flow will detect it.

The reporting of the paper is actually pretty good (why am I so surprised?), although the BBC report (and I’m sure others) does seem to confuse the different forms of test accuracy.

However, there is a slight caveat here, as this all depends on the D2:D3 ratio.

The UCL researchers use of 0.5 for π is based on published estimates of the period of detectable virus (D2+D3) and infectiousness (D2).  They also correctly note that the effective ratio will depend on whether the disease is growing or decaying in the population (another form of sampling bias similar to the issues in measuring the serial interval for the virus discussed in my ICTAC keynote).  Given that the Liverpool study on which they based their own estimates had been during a time of decay, they note that the results may be even better than they suggest.

However, there is yet another sampling bias at work!  The low specificity figures for lateral flow are always on asymptomatic individuals.  The test is known to be more accurate when the patient is already showing symptoms.  This means that lateral flow tests would only ever be applied in stage D3 if the individual had never been symptomatic during the entire infectious period of the virus (D2).  Early on it was believed that a large proportion of people may have been entirely asymptomatic; this was perhaps wishful thinking as it would have made early herd immunity more likely.  However a systematic review suggested that only between a quarter and a third of cases are never symptomatic, so that the impact of negative lateral flow tests during stage D3 will be a lot smaller than the paper suggests.

In summary there are three kinds of sampling effects at work:

  1. inclusion in prior studies of tests during stage D3 when we would not expect nor need lateral flow tests to give positive results
  2. relative changes in the effective number of people in stages D2 and D3 depending on whether the virus is growing or decaying in the population
  3. asymptomatic testing regimes that make it less likely that stage D3 tests are performed

Earlier work ignored (1) and so may under-estimate lateral flow sensitivity. The UCL work corrects for (1), suggesting a far higher accuracy for lateral flow, and discusses (2), which means it might be even better.  However, it misses (3), so overstates the improvement substantially!

Take-aways:

  • for Covid – Lateral flow tests may be more accurate than first believed, but a negative test result does not mean ‘safe’, just less likely to be infected.
  • for statistics – (i) Be aware of time-based sampling issues when populations or other aspects are changing.  (ii) Even when you spot one potential source of sampling bias, do dig deeper; there may be more.

Story 3 – Leos get their jabs

Health department officials in Salt Lake County, Utah decided to look at their data on vaccination take-up.  An unexpected result was that there appeared to be  a substantial difference between citizens with different birth signs. Leos topped the league table with a 70% vaccination rate whilst Scorpios trailed with less than half vaccinated.

Although I’d hate to argue with the obvious implication that Leos are naturally more caring and considerate, maybe the data is not quite so cut and dried.

The first thing I wonder when I see data like this is whether it is simply a random fluke.  By definition the largest element in any data set tends to be a bit extreme, and this is a county, so maybe the numbers involved are quite large.  However, Salt Lake County is the largest county in Utah with around 1.2 million residents according to the US Census; so, even ignoring children or others not eligible, still around 900,000 people.

Looking at the full list of percentages, it looks like the average take-up is between 55% and 60%, with around 75,000 people per star sign (900,000/12).  Using my quick and dirty rule for this kind of data: look at the number of people in the smaller side (30,000 = 40% of 75,000); take its square root (about 170); and as it is near the middle multiply by 1.5 (~250).  This is the sort of variation one might expect to see in the data.  However 250 out of 75,000 people is only about 0.3%, so these variations of +/-10% look far more than a random fluke.

The Guardian article about this digs a little deeper into the data.

The Utah officials knew the birth dates of those who had been vaccinated, but not the overall date-of-birth data for the county as a whole.  If this were not uniform by star sign, then it could introduce a sampling bias.  To counteract this, they used national US population data to estimate the numbers in each star sign in the county and then divided their own vaccination figure by these estimated figures.

That is, they combined two sets of data:

  • their own data on birth dates and vaccination
  • data provided (according to the Guardian article) by University of Texas-Austin on overall US population birth dates

The Guardian suggests that in attempting to counteract sampling bias in the former, the use of the latter may have introduced a new bias. The Guardian uses two pieces of evidence for this.

  1. First an article in the journal Public Health Report that showed that seasonal variation in births varied markedly between states, so that comparing individiual states or counties with national data could be flawed.
  2. Second a blog post by Swint Friday of the College of Business Texas A&M University-Corpus Christi, which includes a table (see below) of overall US star sign prevalence that (in the Guardian’s words) “is a near-exact inverse of the vaccination one“, thus potentially creating the apparent vaccination effect.

Variations in birth rates through the year are often assumed to be in part due to seasonal bedtime activity: hunkering down as the winter draws in vs. short sweaty summer nights; while the Guardian, cites a third source, The Daily Viz, to suggest that “Americans like to procreate around the holiday period“. More seriously, the Public Health Report article also links this to seasonal impact on pre- and post-natal mortality, especially in boys.

Having sorted the data in their own minds, the Guardian reporting shifts to the human interest angle, interviewing the Salt Lake health officials and their reasons for tweeting this in the first place.

But … yes, there is always a but … the Guardian fails to check the various sources in a little more detail.

The Swint Friday blog has figures for Leo at 0.063% of the US population whilst Scorpio tops it at 0.094%, with the rest in between.  Together the figures add up to around 1% … what happened to the other 99% of the population … do they not have a star sign?  Clearly something is wrong, I’m guessing the figures are proportions not percentages, but it does leave me slightly worried about the reliability of the source.

Furthermore, the Public Health Report article (below) shows July-Aug (Leo period) slightly higher rather than lower in terms of birth date frequency, as does more recent US data on births.

from PASAMANICK B, DINITZ S, KNOBLOCH H. Geographic and seasonal variations in births. Public Health Rep. 1959 Apr;74(4):285-8. PMID: 13645872; PMCID: PMC1929236

Also, the ratio between largest and smallest figures in the Swint Friday table is about a half of the smaller figure (~1.5:1), whereas in the figure above it is about an eighth and in the recent data less than a tenth.

The observant reader might also notice the date on the graph above, 1955, and that it only refers to white males and females.  Note that this comes from an article published in 1959, focused on infant mortality and exemplifies the widespread structural racism in the availability of historic health data.  This is itself another form of sampling bias and the reasons for the selection are not described in the paper, perhaps it was just commonly accepted at the time.

Returning to the date, as well as describing state-to-state variation, the paper also surmises that some of this difference may be due to socio-economic factors and that:

The increased access of many persons in our society to the means of reducing the stress associated with semitropical summer climates might make a very real difference in infant and maternal mortality and morbidity.

Indeed, roll on fifty years, and looking at the graph at Daily Viz based on more recent US government birth data produced at Daily Viz, the variation is indeed far smaller now than it was in 1955.

from How Common Is Your Birthday? Pt. 2., the Daily Viz, Matt Stiles, May 18, 2012

As noted the data in Swint Friday’s blog is not consistent with either of these sources, and is clearly intended simply as a light-hearted set of tables of quick facts about the Zodiac. The original data for this comes from Statistics Brain, but this requires a paid account to access, and given the apparent quality of the resulting data, I don’t really want to pay to check! So, the ultimate origins of thsi table remains a mystery, but it appears to be simply wrong.

Given it is “a near-exact inverse” of the Utah star sign data, I’m inclined to believe that this is the source that Utah health officials used, that is data from the Texas A&M University, not Texas University Austin.  So in the end I agree with the Guardian’s overall assessment, even if their reasoning is somewhat flawed.

How is it that the Guardian did not notice these quite marked discrepancies in the data. I think the answer is confirmation bias, they found evidence that agreed with their belief (that Zodiac signs can’t affect vaccination status) and therefore did not look any further.

Finally, we only heard about this because it was odd enough for Utah officials to tweet about it.  How many other things did the Utah officials consider that did not end up interesting?  How many of the other 3000 counties in the USA looked at their star sign data and found nothing.  This is a version of the  file-drawer effect for scientific papers, where only the results that ‘work’ get published.  With so many counties and so many possible things to look at, even a 10,000 to 1 event would happen sometimes, but if only the 10,000 to one event gets reported, it would seem significant and yet be pure chance.

Take-aways:

  • for Covid – Get vaccinated whatever your star sign.
  • for statistics – (i) Take especial care when combining data from different sources to correct sampling bias, you might just create a new bias. (ii) Cross check sources for consistency, and if they are not why not? (iii) Beware confirmation bias, when the data agrees with what you believe, still check it!  (iv) Remember that historical data and its availability may reflect other forms of human bias. (v) The file-drawer effect – are you only seeing the selected apparently unusual data?

 

Busy September – talks, tutorials and an ultra-marathon

September has been a full month!

During the last two weeks things have started to kick back into action, with the normal rounds of meetings and induction week for new students.  For the latter I’d pre-recorded a video welcome, so my involvement during the week was negligible.  However, in addition I delivered a “Statistics for HCI” day course organised by the BCS Interaction Group with PhD students from across the globe and also a talk “Designing User Interactions with AI: Servant, Master or Symbiosis” at the AI Summit London.  I was also very pleased to be part of the “60 faces of IFIP” campaign by the International Federation for Information Processing.

It was the first two weeks that stood out though, as I was back on Tiree for two whole weeks.  Not 100% holiday as during the stay I gave two virtual keynotes: “Qualitative–Quantitative Reasoning: thinking informally about formal things” at the International Colloquium on Theoretical Aspects of Computing (ICTAC) in Kazakhstan and “Acting out of the Box” at the University of Wales Trinity St David (UWTSD) Postgraduate Summer School.  I also gave a couple of lectures on “Modelling interactions: digital and physical” at the ICTAC School which ran just before the conference and presented a paper on “Interface Engineering for UX Professionals” in the Workshop on HCI Engineering Education (HCI-E2) at INTERACT 2021 in Bari.  Amazing how easy it is to tour the world from a little glamping pod on a remote Scottish Island.

Of course the high point was not the talks and meetings, but the annual Tiree Ultra-marathon.  I’d missed last year, so especially wonderful to be back: thirty five miles of coastline, fourteen beaches, not to mention so many friendly faces, old friends and new.  Odd of course with Covid zero-contact and social distancing – the usual excited press of bodies at the pre-race briefing in An Talla, the Tiree community hall, replaced with a video webinar and all a little more widely spaced for the start on the beach too.

The course was slightly different too, anti-clockwise and starting half way along Gott Bay, the longest beach.  Gott Bay is usually towards the end of the race, about 28 miles in, so the long run, often into the wind is one of the challenges of the race.  I recall in 2017 running the beach with 40 mile an hour head wind and stinging rain – I knew I’d be faster walking, but was determined to run every yard of beach..  Another runner came up behind me and walked in my shelter.  However, this year had its own sting in the tail with Ben Hynish, the highest point, at 26 miles in.

The first person was across the line in about four-and-a-quarter hours, the fastest time yet.  I was about five hours later!

This was my fifth time doing the ultra, but the hardest yet, maybe in part due to lockdown couch-potato-ness!  My normal training pattern is that about a month before the ultra I think, “yikes, I’ve not run for a year” and then rapidly build up the miles – not the recommended training regime!  This year I knew I wasn’t as fit as usual, so I did start in May … but then got a knee injury, then had to self-isolate … and then it was into the second-half of July; so about a month again.

Next year it will be different, I will keep running through the winter … hmm … well, time will tell!

The different September things all sound very disparate – and they are, but there are some threads and connections.

The first thread is largely motivational.

The UWTSD keynote was about the way we are not defined by the “kind of people” we think of ourselves as being, but by the things we do.  The talk used my walk around Wales in 2013 as the central example, but the ultra would have been just as pertinent.  Someone with my waistline is not who one would naturally think as being an ultramarathon runner – not that kind of person, but I did it.

However, I was not alone.  The ‘winners’ of the ultra are typically the rangy build one would expect of a long-distance runner, but beyond the front runners, there is something about the long distance that attracts a vast range of people of all ages, and all body shapes imaginable.  For many there are physical or mental health stories: relationship breakdowns, illnesses, that led them to running and through it they have found ways to believe in themselves again.  Post Covid this was even more marked: Will, who organises the ultra, said that many people burst into tears as they crossed the finish line, something he’d never seen before.

The other thread is about the mental tools we need to be a 21st century citizen.

The ICTAC keynote was about “Qualitative–Quantitative Reasoning”, which is my term for the largely informal understanding of numbers that is so important for both day-to-day and professional life, but is not part of formal education.  The big issues of our lives from Covid to Brexit to climate change need us to make sense of large-scale numerical or data-rich phenomena.  These often seem too complex to make sense of, yet are ones where we need to make appropriate choices in both our individual lives and political voices.  It is essential that we find ways to aid understanding in the public, press and politicians – including both educational resources and support tools.

The statistics course and my “Statistics for HCI” book are about precisely this issue – offering ways to make sense of often complex results of statistical analysis and obtain some of the ‘gut’ understanding that professional statisticians develop over many years.

My 60 faces of IFIP statement also follows this broad thread:

“Digital techology is now essential to being a citizen. The future of information processing is the future of everyone; so needs to be understood and shaped by all. Often ICT simply reinforces existing patterns, but technology is only useful if we can use it to radically reimagine a better world.


More information on different events

Tiree Ultra

Tiree Ultramarathon web page and Facebook Group

Paper: Interface Engineering for UX Professionals

HCI-E2: Workshop on HCI Engineering Education – for developers, designers and more, INTERACT 2021, Bari, Italy – August 31st, 2021. See more – paper and links

Summer School Lecturea: Modelling interactions: digital and physical

Lecture at ICTAC School 2021: 18th International Colloquium on Theoretical Aspects of Computing, Nazarbayev University, Nur-Sultan, Kazakhstan, 1st September 2021. See more – abstract and links

Talk: Designing User Interactions with AI: Servant, Master or Symbiosis

The AI Summit London, 22nd Sept. 2021. See moreabstract and links

Day Course: Statistics for HCI

BCS Interaction Group One Day Course for PhD Students, 21st Sept. 2021.
See my Statistics for HCI Micro-site.

Keynote: Acting out of the Box

Rhaglen Ysgol Haf 2021 PCYDDS / UWTSD Postgraduate Summer School 2021, 10th Sept. 2021. See more – abstract and links

Keynote: Qualitative–Quantitative Reasoning: thinking informally about formal things

18th International Colloquium on Theoretical Aspects of Computing, Nazarbayev University, Nur-Sultan, Kazakhstan, 10th Sept. 2021. See more – full paper and links

Induction week greeting

 

Darwinian markets and sub-optimal AI

Do free markets generate the best AI?  Not always, and this not only hits the bottom line, but comes with costs for personal privacy and the environment.  The combination of network effects and winner-takes-all advertising expenditure means that the resulting algorithms may be worst for everyone.

A few weeks ago I was talking with Maria Ferrario (Queens University Belfast) and Emily Winter (Lancaster University) regarding privacy and personal data.  Social media sites and other platforms are using ever more sophisticated algorithms to micro-target advertising.  However, Maria had recently read a paper suggesting that this had gone beyond the point of diminishing returns: far simpler  – and less personally intrusive – algorithms achieve nearly as good performance as the most complex AI.  As well as having an impact on privacy, this will also be contributing to the ever growing carbon impact of digital technology.

At first this seemed counter-intuitive.  While privacy and the environment may not be central concerns, surely companies will not invest more resources in algorithms than is necessary to maximise profit?

However, I then remembered the peacock tail.


Jatin Sindhu, CC BY-SA 4.0, via Wikimedia Commons
The peacock tail is a stunning wonder of nature.  However, in strict survival terms, it appears to be both flagrantly wasteful and positively dangerous – like eye-catching supermarket packaging for the passing predator.

The simple story of Darwinian selection suggests that this should never happen.  The peacocks that have smaller and less noticeable tails should have a better chance of survival, passing their genes to the next generation, leading over time to more manageable and less bright plumage.  In computational terms, evolution acts as a slow, but effective optimisation algorithm, moving a population ever closer to a perfect fit with its environment.

However, this simple story has twists, notably runaway sexual selection.  The story goes like this.  Many male birds develop brighter plumage during mating season so that they are more noticeable to females.  This carries a risk of being caught by a predator, but there is a balance between the risks of being eaten and the benefits of copulation.  Stronger, faster males are better able to fight off or escape a predator, and hence can afford to have slightly more gaudy plumage.  Thus, for the canny female, brighter plumage is a proxy indicator of a more fit potential mate.  As this becomes more firmly embedded into the female selection process, there is an arms race between males – those with less bright plumage will lose out to those with brighter plumage and hence die out.  The baseline constantly increases.

Similar things can happen in free markets, which are often likened to Darwinian competition.

Large platforms such as Facebook or Google make the majority of their income through advertising.  Companies with large advertising accounts are constantly seeking the best return on their marketing budgets and will place ads on the platform that offers the greatest impact (often measured by click-through) for the least expenditure.  Rather like mating, this is a winner-takes-all choice.  If Facebook’s advertising is 1% more effective than Google’s  a canny advertiser will place all their adverts with Facebook and vice versa.  Just like the peacock there is an existential need to outdo each other and thus almost no limit on the resources that should be squandered to gain that elusive edge.

In practice there are modifying factors; the differing demographics of platforms mean that one or other may be better for particular companies and also, perhaps most critically, the platform can adjust its pricing to reflect the effectiveness so that click-through-per-dollar is similar.

The latter is the way the hidden hand of the free market is supposed to operate to deliver ‘optimal’ productivity.  If spending 10% more on a process can improve productivity by 11% you will make the investment.  However, the theory of free markets (to the extent that it ever works) relies on an ‘ideal’ situation with perfect knowledge, free competition and low barriers to entry.  Many countries operate collusion and monopoly laws in pursuit of this ‘ideal’ market.

Digital technology does not work like this. 

For many application areas, network effects mean that emergent monopolies are almost inevitable.  This was first noticed for software such as Microsoft Office – if all my collaborators use Office then it is easier to share documents with them if I use Office also.  However, it becomes even more extreme with social networks – although there are different niches, it is hard to have multiple Facebooks, or at least to create a new one – the value of the platform is because all one’s friends use it.

For the large platforms this means that a substantial part of their expenditure is based on maintaining and growing this service (social network, search engine, etc.).  While the income is obtained from advertising, only a small proportion of the costs are developing and running the algorithms that micro-target adverts.

Let’s assume that the ratio of platform to advertising algorithm costs is 10:1 (I suspect it is a lot greater).  Now imagine platform P develops an algorithm that uses 50% more computational power, but improves advertising targeting effectiveness by 10%; at first this sounds a poor balance, but remember that 10:1 ratio.

The platform can charge 10% more whilst being competitive.   However, the 50% increase in advertising algorithm costs is just 5% of the overall company running costs, as 90% are effectively fixed costs of maintaining the platform.  A 5% increase in costs has led to a 10% increase in corporate income.  Indeed one could afford to double the computational costs for that 10% increase in performance and still maintain profitability.

Of course, the competing platforms will also work hard to develop ever more sophisticated (and typically privacy reducing and carbon producing) algorithms, so that those gains will be rapidly eroded, leading to the next step.

In the end there are diminishing returns for effective advertising: there are only so many eye-hours and dollars in users’ pockets. The 10% increase in advertising effectiveness is not a real productivity gain, but is about gaining a temporary increase in users’ attention, given the current state of competing platforms’ advertising effectiveness.

Looking at the system as a whole, more and more energy and expenditure are spent on algorithms that are ever more invasive of personal autonomy, and in the end yield no return for anyone.

And it’s not even a beautiful extravagance.

A brief history of array indices — making programs that fit people

A colleague recently said to me “As computer scientists, our index always starts with a 0“, and my immediate thought was “not when I was a lad“!
As well as revealing my age, this is an interesting reflection on the evolution of programming languages, and in particular the way that programming languages in some ways have regressed in terms of human-centredness expecting the human to think like a machine, rather than the machine doing the work.
But let’s start with array indices.  If you have programmed arrays in Java, Javascript, C++, PHP, or (lists in) Python they all have array indices starting at 0: a[0],,a[1], etc.  Potentially a little confusing for the new programmer, an array of size 5 therefore has last index 4 (five indices: 0,1,2,3,4).  Also code is therefore full of ‘length-1’
double values[] = codeReturningArray();
double first = values[0];
double last = values[values.length-1];
This feels so natural  we hardly notice we are doing it.  However, it wasn’t always like this …
The big three early programming languages were Fortran (for science), Algol (for mathematics and algorithms) and COBOL (for business). In all of these arrays/tables start at 1 by default (reflecting mathematical conventions for matrices and vectors), but both Fortran and Algol could take arbitrary ranges – the compiler did the work of converting these into memory addresses.
Another popular early programming language was BASIC created as a language for learners in 1964, and the arrays in the original Basic also started at 1.  However, for anyone learning Basic today, it is likely to be Microsoft Visual Basic used both for small business applications and also scripting office documents such as Excel.  Unlike the original Basic, the arrays in Visual Basic are zero based arrays ending one less than the array size (like C).  Looking further into the history of this, arrays in the first Microsoft Basic in 1980 (a long time before Wiindows) allowed 0 as a start index, but Dim A(10) meant there were 11 items in the array 0–10. This meant you could ignore the zero index if you wanted and use A(1..10) like in earlier BASIC, Fortran etc, but meaning the compiler had to do less work.

Excerpt from 1964 BASIC manual (download)
In both Pascal and Ada, arrays are more strongly typed, in that the programmer explicitly specifies the index range, not simply a size.  That is, it is possible to declare zero-based arrays A[0..9], one-based arrays A[1..7] or indeed anything else A[42..47].  However, illustrative examples of both Pascal arrays and Ada arrays typically have index types stating at 1 as this was consistent with earlier languages and also made more sense mathematically.
It should be noted that most of the popular early language also allowed matrices or multi-dimensional arrays,
Fortran: DIMENSION A(10,5)
Algol:   mode matrix = [1:3,1:3]real; 
Basic:   DIM B(15, 20)
Pascal:  array[1..15,1..10] of integer;
So, given the rich variety of single and multi-dimensional arrays, how is it that arrays now all start at zero?  Is this the result of deep algebraic or theoretical reflection by the computer science community?  In fact the answer is far more prosaic.
Most modern languages are directly or indirectly influenced by C or one of its offshoots (C++, Java, etc.), and these C-family languages all have zero indexed arrays because C does.
I think this comes originally from BCPL (which I used to code my A-level project at school) which led to B and then C.  Arrays in BCPL were pointer based (as in C) making no distinction between array and pointer.  BCPL treated an ‘array’ declaration as being memory allocation and ‘array access (array!index) as pointer arithmetic.  Hence the zero based array index sort of emerged.
This was all because the target applications of BCPL were low-level system code.  Indeed, BCPL was intended to be a ‘bootstrap’ language (I think the first language where the compiler was written in itself) enabling a new compiler to be rapidly deployed on a new architecture. BCPL (and later C) was never intended for high-level applications such as scientific or commercial calculations, hence the lack of non-zero based arrays and proper multi-dimensional arrays.
This is evident in other areas beyond arrays. I once gave a C-language course at one of the big financial institutions. I used mortgage calculation as an example.  However, the participants quickly pointed out that it was not a very impressive example, as native integers were just too small for penny-accurate calculations of larger mortgages.  Even now with a 64 bit architecture, you still need to use flexible-precision libraries for major financial calculations, which came ‘for free’ in COBOL where numbers were declared at whatever precision you wanted.
Looking back with a HCI hat on, it is a little sad to see the way that programming languages have regressed from being oriented towards human understanding with the machine doing the work to transform that into machine instructions, towards languages far more oriented towards the machine with the human doing the translation 🙁   
Maybe it is time to change the tide.

 

 

Visiting the lost land

Last Sunday I finally got to see Troedrhiwfuwch, or perhaps strictly the site of Troedrhiwfuwch.

Troedrhiwfuwch, known as Troedy locally, was once a thriving mining village in the Rhymney valley.  Then in the 1980s the whole village was condemned and abandoned due to fear of landslips and now only two houses and the war memorial still stand to mark this lost village.

However, whilst lost on the ground, the community are still active most of which only recall the village from early childhood, or were born later and know it only from the stories of parents, grandparents, and others from the community.  There is a vibrant Facebook memories group where photos and stories are shared and also several members have been collecting photos, newspaper accounts and trawling through censuses and war records.

I’ve been working with the community for a few months partly funded by a Cherish-DE Knowledge Exchange.  We’ve been looking at how digital technology could help both preserve the legacy and also share it with visitors.

A central part of this has centred around the war memorial, partly because it serves as a tangible marker of the village where many gather each year on Armistice Day, and partly because the village sent so many young men in proportion to its size, more than one per household, many of whom never returned.

Until now all of our meetings have been remote.  We shared photographs, prototypes and stories and jokes, but all through the little windows of Zoom.  However, last Sunday I travelled through the quiet towns of mid Wales, up through the Brecon Beacons the kerbs lined with the parked cars of families enjoying the Bank Holiday sunshine, then back down into the Rhymney Valley and the tranquility of Troedrhiwfuwch.

Liz, Carys and Vince, who I’d been working with and together with a few others from the community the local councillor Eluned Stenner was there and also Lisa the Armed Forces Regional Officer.  They had already set up a tea, cake and biscuit station, complete with generator for the kettle – a combination of Valley’s hospitality and Vince’s army background in as an engineer meant we were well sorted.

I said ‘the tranquility of Troedrhiwfuwch’, but beside the War Memorial itself, it is anything but as the main valley road runs only feet away.  One of the worries that the community has had for many years is that the crowd gathering for the Armistice Day act of remembrance did so at very real risk of their own lives.  As soon as she saw this Eluned promised to ensure the road was closed for the next Armistice Day.

However, the tea table and second table covered in copies of many of the historic documents collected over the years, were in the Memorial Gardens, just a stones throw from the road and yet a haven of peace.

The Memorial Gardens are on the site of St Teilo’s Church, which also torn down with the rest of the village in the 1980s, although all the contents of the inside of the church have been preserved in a side chapel in St Tyfaelog’s at Pontlottyn.

The aim is to plant a shrub with a memorial plaque for each of the war dead, several of which have no other grave, to both act as a location for their families to visit and also as a memorial more broadly for the village.

On the table of resources you can see a few mock-ups of plaques with QR codes that link to information about each person.  I’ve helped the community create these we plan make these contextual, so that, for example, a school group visiting can have information tailored to their age and curriculum.

 

The discussions with Eluned and Lisa suggested various funds that the community could apply for to enable work on the gardens and refurbishing the war memorial.  The community is also named in a substantial research proposal that Swansea University submitted in March that also included St Fagans and partners in Cork – so croesi bysedd for that!

Irrespective of particular grants (although that will help!), we will continue to work together.  As the outsider the lost village is a fascinating story, and I am constantly amazed at the knowledge, enthusiasm, and dedication of the community team working on the Troedrhiwfuwch archives.  With my technologist/researcher hat on, I’m also thinking about the potential digital tools and methods that could enable other communities to more easily preserve and share their own memories and stories.

In terms of digital technology, the next steps will include more ways to help link the digital archives to the physical location, including geocoding pictures, rather like the Nesta-funded Frasan app I was involved wit on Tiree some years back. As well the links to the wordd wars, the village is connected to human stories of industrial change, migration, sport, not to mention the geological features underlying the ‘moving mountain’, which eventually caused its demise.

In addition, there is less visible, but perhaps in the long term more critical, ‘back stage’ work in helping to connect and annotate the various photos and documents in the archive — linking stories to the objects.  Although the domains are rather different, I expect this aspect to intersect with work on democratising digitisation in the AHRC InterMusE project and also connect to other disciplines across Swansea University.

For more about Troedrhiwfuwch:

 

 

 

Online 1882 Gazetteer of Scotland


In the late 2000s, not long after moving to Tiree, I came across John Wilson’s 1882 Gazetteer of Scottish place names at the Internet Archive and thought it would be a lovely if it were properly usable as an online resource.

For various reasons I never finished at the time, but over Easter I returned to the project and now have a full online version available, browsable page by page or entry by entry.  There is more work to be done to make it really usable, but it is a beginning.

I’m using this and other digitisation projects as ways to understand the kinds of workflows and tools to help others create their own digital resources based on archive materials.  In the InterMusE project, recently funded by AHRC, we are working with local musical societies to help them digitise their historic concert programs and other documents.

 

Fact checking Full Fact

It is hard to create accurate stories about numerical data.

Note: Even as I wrote this blog events have overtaken us.  The blog is principally about analysing how fact checking can go wrong; this will continue to be an issue, so remains relevant.  But it is also about the specific issues with FullFact.org’s discussion of the community deaths that emerged from my own modelling of university returns.  Since Full Fact’s report a new Bristol model has been published which confirms the broad patterns of my work and university cases are already growing across the UK (e.g. LIverpool,Edinburgh) with lockdowns in an increasing number of student halls (e.g. Dundee)).
It is of course nice to be able to say “I was right all along“, but in this case I wish I had been wrong.

A problem I’ve been aware of for some time is how difficult many media organisations have in formulating evidence and arguments, especially those involving numerical data.  Sometimes this is due to deliberately ‘spinning’ an issue, that is the aim is distortion.  However, at other times, in particular fact checking sites, it is clear that the intention is offer the best information, but something goes wrong.

This is an important challenge for my own academic community, we clearly need to create better tools to help media and the general public understand numerical arguments.  This is particularly important for Covid and I’ve talked and written elsewhere about this challenge.

Normally I’ve written about this at a distance, looking at news items that concern other people, but over the last month I’ve found  myself on the wrong side of media misinterpretation or maybe misinformation.  The thing that is both most fascinating (with an academic hat on) and also most concerning is the failure in the fact-checking media’s ability to create reasoned argument.

This would merely be an interesting academic case study, were it not that the actions of the media put lives at risk.

I’ve tried to write succinctly, but what follows is still quite long.  To summarise I’m a great fan of fact checking sites such as Full Fact, but I wish that fact checking sites would:

  • clearly state what they are intending to check: a fact, data, statement, the implicit implications of the statement, or a particular interpretation of a statement.
  • where possible present concrete evidence or explicit arguments, rather than implicit statements or innuendo; or, if it is appropriate to express belief in one source rather than another do this explicitly with reasons.

However, I also realise how I need better ways to communicate my own work both numerical aspects, but also textually.  I realise that often behind every sentence, rather like an iceberg, there is substantial additional evidence or discussion points.

Context

I’d been contacted by Fullfact.org at the end of August in relation to the ‘50,000 deaths due to universities’ estimate that was analysed by WonkHE and then tweeted by UCU.  This was just before the work was briefly discussed on Radio 4’s More or Less … without any prior consultation or right of reply.  So full marks to Full Fact for actually contacting the primary source!

I gave the Full Fact journalist quite extensive answers including additional data.  However, he said that assessing the assumptions was “above his pay grade” and so, when I heard no more, I’d assumed that they had decided to abandon writing about it.

Last week on a whim, just before gong on holiday, I thought to check and discovered that Fullfact.org had indeed published the story on 4th September, indeed it still has pride of place on their home page!

Sadly, they had neglected to tell me when it was published.

Front page summary – the claim

First of all let’s look at the pull out quote on the home page (as of 22nd Sept).

At the top the banner says “What was claimed”, appearing to quote from a UCU tweet and says (in quote marks):

The return to universities could cause 50,000 deaths from Covid-19 without “strong controls”

This is a slight (but critical) paraphrase of the actual UCU tweet which quoted my own paper::

“Without strong controls, the return to universities would cause a minimum of 50,000 deaths.”

The addition of “from Covid-19” is filling in context.  Pedantically (but important for a fact checking site), by normal convention this would be set in some way to make clear it is an insertion into the original text, for example [from Covid-19].  More critically, the paraphrase inverts the sentence, thus making the conditional less easy to read, replaces “would cause a minimum” with “could cause”. and sets “strong controls” in scare quotes.

While the inversion does not change the logic, it does change the emphasis.  In my own paper and UCU’s tweet the focus on the need for strong controls comes first, followed by the implications if this is not done; whereas in the rewritten quote the conditional “without strong controls” appears more like an afterthought.

On the full page this paraphrase is still set as the claim, but the text also includes the original quote.  I have no idea why they chose to rephrase what was a simple statement to start with.

Front page summary – the verdict

It appears that the large text labelled ‘OUR VERDICT’ is intended to be a partial refutation of the original quote:

The article’s author told us the predicted death toll “will not actually happen in its entirety” because it would trigger a local or national lockdown once it became clear what was happening.

This is indeed what I said!  But I am still struggling to understand by what stretch of the imagination a national lockdown could be considered anything but “strong controls“.  However, while this is not a rational argument, it is a rhetorical one, emotionally what appears to be negative statement “will not actually happenfeels as though it weakens the original statement, even though it is perfectly consonant with it.

One of the things psychologists have known for a long time is that as humans we find it hard to reason with conditional rules (if–then) if they are either abstract or disagree with one’s intuition.  This lies at the heart of many classic psychological experiments such as the Wason card test.   Fifty thousand deaths solely due to universities is hard to believe, just like the original Covid projections were back in January and February, and so we find it hard to reason clearly.

In a more day-to-day example this is clear.

Imagine a parent says to their child, “if you’re not careful you’ll break half the plates

The chid replies, “but I am being careful”.

While this is in a way a fair response to the implied rider “... and you’re not being careful enough“, it is not an argument against the parent’s original statement.

When you turn to the actual Full Fact article this difficulty of reasoning becomes even more clear.  There are various arguments posed, but none that actually challenge the basic facts, more statements that are of an emotional rhetorical nature … just like the child’s response.

In fact if Full Fact’s conclusion had been “yes this is true, but we believe the current controls are strong enough so it is irrelevant“, then one might disagree with their opinion , but it would be a coherent argument.  However, this is NOT what the site claims, certainly in its headline statements.

A lack of alternative facts

To be fair to Full Fact the most obvious way to check this estimated figure would have been to look at other models of university return and compare it with them.  It is clear such models exist as SAGE describes discussions involving such models, but neither SAGE nor indie-Sage‘s reports on university return include any estimated figure for overall impact.  My guess is that all such models end up with similar levels to those reported here and that the modellers feel that they are simply too large to be believable … as indeed I did when I first saw the outcomes of my own modelling..

Between my own first modelling in June and writing the preprint article there was a draft report from a three day virtual study group of mathematicians looking at University return, but other than this I was not aware of work in the public domain at the time. For this very reason, my paper ends with a call “for more detailed modelling“.

Happily, in the last two weeks two pre-print papers have come from the modelling group at Bristol, one with a rapid review of University Covid models and one on their own model.  Jim Dickinson has produced another of his clear summaries of them both.  The Bristol model is far more complex than those that I used including multiple types of teaching situation and many different kinds of students based on demographic and real social contact data.  It doesn’t include student–non-student infections, which I found critical in spread between households, but does include stronger effects for in-class contagion.  While very different types of modelling, the large-scale results of both suggest rapid spread within the student body.  The Bristol paper ends with a warning about potential spread to the local community, but does not attempt to quantify this, due the paucity of data on student–non-student interactions.

Crucially, the lack of systematic asymptomatic testing will also make it hard to assess the level of Covid spread within the student population during the coming autumn and also hard to retrospectively assess the extent to which this was a critical factor in the winter Covid spread in the wider population.  We may come to this point in January and still not have real data.

Full page headlines

Following through to the full page on Full Fact, the paraphrased ‘claim’ is repeated with Full Fact’s ‘conclusion’ … which is completely different from the front page ‘OUR VERDICT’.

The ‘conclusion’ is carefully stated – rather like Boris Johnson’s careful use of the term ‘controlled by’ when describing the £350 million figure on the Brexit bus.  It does not say here whether Full Fact believes the (paraphrased) claim, but they merely make a statement relating to it.  In fact at the end of the article there is rather more direct conclusion berating UCU for tweeting the figure.  That is Full Fact do have a strong conclusion, and one that is far more directly related to the reason for fact checking this in the first place, but instead of stating this explicitly, the top of page headline ‘conclusion’ in some sense sits on the fence.

However, even this ‘sit on the fence’ statement is at very least grossly misleading and in reality manifestly false.

The first sentence:

This comes from a research paper that has not been peer-reviewed

is correct, and one of the first things I pointed out when Full Fact contacted me.  Although the basic mathematics was read by a colleague, the paper itself has not been through formal peer review, and given the pace of change will need to be changed to be retrospective before it will be.  This said, in my youth I was a medal winner in the International Mathematical Olympiad and I completed my Cambridge mathematics degree in two years; so I do feel somewhat confident in the mathematics itself!  However, one of the reasons for putting the paper on the preprint site arXiv was to make it available for critique and further examination.

The second statement is not correct.  The ‘conclusion’ states that

It is based on several assumptions, including that every student gets infected, and nothing is done to stop it.

IF you read the word “it” to refer to the specific calculation of 50,000 deaths then this is perhaps debatable.  However, the most natural reading is that “it” refers to the paper itself, and this interpretation is reinforced later in the Full Fact text, which says “the article [as in my paper] assumes …”.  This statement is manifestly false.

The paper as a whole models student bubbles of different sizes, and assumes precisely the opposite, that is assuming rapid spread only within bubbles.  That is it explicitly assumes that something (bubbles) is done to stop it. The outcome of the models, taking a wide range of scenarios, is that in most circumstances indirect infections (to the general population and back) led to all susceptible students being infected.  One can debate the utility or accuracy of the models, but crucially “every student gets infected” is a conclusion not an assumption of the models or the paper as a whole.

To be fair on Full Fact this confusion between the fundamental assumptions of the paper and the specific values used for this one calculation is echoing Kit Yates initial statements when he appeared on More or Less. I’m still not sure whether that was a fundamental misunderstanding or a slip of the tongue during the interview and my attempts to obtain clarification have failed.  However, I did explicitly point this distinction out to Full Fact.

The argument

The Full Fact text consists of two main parts.  One is labelled “Where did “50,000 deaths” come from?”, which is ostensibly a summary of my paper, but in reality seems to be where there are the clearest fact-check style statements.  The second is labelled “But will this happen?” which sounds as if this is the critique.  However, it is actually three short paragraphs the first two effectively setting me and Kit Yates head-to-head and the third is the real conclusion which says that UCU tweeted the quote without context.

Oddly I was never asked whether I believed that the UCU’s use of the statement was consistent with the way in which it was derived in my work.  This does seem a critical question given that Full Fact’s final conclusion is that UCU quoted it out of context. Indeed, while the Full Fact claims that UCU tweeted “the quote without context“, within the length of a tweet the UCU both included the full quote (not paraphrased!) and directly referenced Jim Dickinson’s summary of my paper on WonkHE, which itself links to my paper.  That is the UCU tweet backed up the statement with links that lead to primary data and sources.

As noted the actual reasoning is odd as the body of the argument, to the extent it exists, appears to be in the section that summarises the paper.

First section – summary of paper

The first section “Where did “50,000 deaths” come from?”, starts off by summarising the assumptions underlying the 50,000 figure being fact checked and is the only section that links to any additional external sources.  Given the slightly askance way it is framed, it is hard to be sure, but it appears that this description is intended to cast doubt on the calculations because of the extent of the assumptions.  This is critical as it is the assumptions which Kit Yates challenged.

In several cases the assumptions stated are not what is said in the paper.  For example, Full Fact says the paper “assumes no effect from other measures already in place, like the Test and Trace system or local lockdowns” whereas the paragraph directly above the crucial calculation explicitly says that (in order to obtain a conservative estimate) the initial calculation will optimistically assume “social distancing plus track and trace can keep the general population R below 1 during this period“.  The 50,000 figure does not include additional more extensive track and trace within the student community, but so far this is no sign of this happening beyond one or two universities adopting their own testing, and this is precisely one of the ‘strong controls’ that the paper explicitly suggests.

Ignoring these clear errors, the summary of assumptions made by the calculation of the 50,000 figure says that I “include the types of hygiene and social distancing measures already being planned, but not stronger controls” and then goes on to list the things not included. It does seem obvious and is axiomatic that a calculation of what will happen “without strong controls” must assume for the purposes of the calculation that there are no strong controls.

The summary section also spends time on the general population R value of 0.7used in the calculation and the implications of this.  The paragraph starts “In addition to this” and quotes that this is my “most optimistic” figure. This is perfectly accurate … but the wording seems to imply this is perhaps (another!) unreasonable assumption … and indeed it is crazily low.  At the time (soon after lockdown) it was still hoped that non-draconian measures (such as track and trace) could keep R below 1, but of course we have seen rises far beyond this and the best estimates for coming winter are now more like 1.2 to 1.5.

Note however the statement was “Without strong controls, the return to universities would cause a minimum of 50,000 deaths.”  That is the calculation was deliberately taking some mid-range estimates of things and some best case ones in order to yield a lower bound figure.  If one takes a more reasonable R the final figure would be a lot larger than 50,000.

Let’s think again of the child, but let’s make the child a stroppy teenager:

Parent, “if you’re not careful you’ll break half the plates

Child replies, throwing the pile of plates to the floor, “no I’ll break them all.”

The teenager might be making a point, but is not invalidating the parent’s statement.

Maybe I am misinterpreting the intent behind this section, but given the lack of any explicit fact-check evidence elsewhere, it seems reasonable to treat this as at least part of the argument for the final verdict.

Final section – critique of claim

As noted, the second section “But will this happen?”, which one would assume is the actual critique and mustering of evidence, consists of three paragraphs: one quoting me, one quoting Kit Yates of Bath, and one which appears to be the real verdict.

The first paragraph is the original statement that appeared as ‘OUR VERDICT’ on the first page where I say that 50,000 deaths will almost certainly not occur in full because the government will be forced to take some sort of action once general Covid growth and death rates rise.  As noted if this is not ‘strong controls‘ what is?

The second paragraph reports Kit Yates as saying there are some mistakes in my model and is quoted as generously saying that he’s “not completely damning the work,”.  While grateful for his restraint, some minimal detail or evidence would be useful to assess his assertion.  On More or Less he questioned some of the values used and I’ve addressed that previously;  it is not clear whether this is what is meant by ‘mistakes’ here.  I don’t know if he gave any more information to Full Fact, but if he has I have not seen it and Full Fact have not reported it.

A tale of three verdicts

As noted the ‘verdict’ on the Full Fact home page is different from the ‘conclusion’ at the top of the main fact-check page, and in reality it appears the very final paragraph of the article is the real ‘verdict’.

Given this confusion about what is actually being checked, it is no wonder the argument itself is somewhat confused.

The final paragraph, the Full Fact verdict itself has three elements:

  • that UCU did not tweet the quote in context – as noted perhaps a little unfair in a tweeted quote that links to its source
  • that the 50,000 “figure comes from a model that is open to question” – well clearly there is question in Kit Yates’ quote, but this would have more force if it were backed by evidence.
  • that it is based on “predictions that will almost certainly not play out in the real world

The last of these is the main thrust of the ‘verdict’ quote on the Full Fact home page.  Indeed there is always a counterfactual element to any actionable prediction.  Clearly if the action is taken the prediction will change.  This is on the one hand deep philosophy, but also common sense.

The Imperial Covid model that prompted (albeit late) action by government in March gave a projection of between a quarter and a half million deaths within the year if the government continued a policy of herd immunity.  Clearly any reasonable government that believes this prediction will abandon herd immunity as a policy and indeed this appears to have prompted a radical change of heart.  Given this, one could have argued that the Imperial predictions “will almost certainly not play out in the real world“.  This is both entirely true and entirely specious.

The calculations in my paper and the quote tweeted by UCU say:

Without strong controls, the return to universities would cause a minimum of 50,000 deaths.”

That is a conditional statement.

Going back to the child; the reason the parent says ““if you’re not careful you’ll break half the plates“, is not as a prediction that half the plates will break, but an encouragement to the child to be careful so that the plates will not break.  If the child is careful and the plates are not broken, that does not invalidate the parent’s warning.

Last words

Finally I want to reiterate how much I appreciate the role of fact checking sites including Full Fact and also fact checking parts of other news sites as as BBC’s Reality Check; and I am sure the journalist here wanted to produce a factual article. However, in order to be effective they need to be reliable.  We are all, and journalists especially, aware that an argument needs to be persuasive (rhetoric), but for fact checking and indeed academia, arguments also need to be accurate and analytic (reason).

There are specific issues here and I am angered at some of the misleading aspects of this story because of the importance of the issues; there are literally lives at stake.

However, putting this aside, the story raises the challenge for me as to how we can design tools and methods to help both those working on fact checking sites and the academic community, to create and communicate clear and correct argument.