Are five users enough?

[An edited version of this post is reproduced at HCIbook online!]

I recently got a Facebook message from a researcher about to do a study who asked, “Do you think 5 (users) is enough?”

Along with Fitts’ Law and Miller’s 7+/-2 this “five is enough” must be among the most widely cited, and most widely misunderstood and misused aphorisms of HCI and usability. Indeed, I feel that this post belongs more in ‘Myth Busters” than in my blog.

So, do I think five is enough? Of course, the answer is (annoyingly), “it depends”, sometimes, yes, five is enough, but sometimes, fewer: one appropriate user, or maybe no users at all, and sometimes you need more, maybe many more: 20, 100, 1000. But even when five is enough, the reasons are usually not directly related to Nielsen and Landauer’s original paper, which people often cite (although rarely read) and Nielsen’s “Why You Only Need to Test With 5 Users” alert box (probably more often actually read … well at least the title).

The “it depends” is partly dependent on the particular circumstances of the situation (types of people involved, kind of phenomenon, etc.), and partly on the kind of question you want to ask. The latter is the most critical issue, as if you don’t know what you want to know, how can you know how many users you need?

There are several sorts of reasons for doing some sort of user study/experiment, several of which may apply:

1. To improve a user interface (formative evaluation)

2. To assess whether it is good enough (summative evaluation)

3. To answer some quantitative question such as “what % of users are able to successfully complete this task”

4. To verify or refute some hypothesis such as “does eye gaze follow Fitts’ Law”

5. To perform a broad qualitative investigation of an area

6. To explore some domain or the use of a product in order to gain insight

It is common to see HCI researchers very confused about these distinctions, and effectively perform formative/summative evaluation in research papers (1 or 2) where in fact one of 3-6 is what is really needed.

I’ll look at each in turn, but first to note that, to the extent there is empirical evidence for ‘five is enough”, it applies to the first of these only.

I dealt with this briefly in my paper “Human–Computer Interaction: a stable discipline, a nascent science, and the growth of the long tail” in the John Long Festschrift edition of IwC, and quote here:

In the case of the figure of five users, this was developed based on a combination of a mathematical model and empirical results (Nielsen and Landauer 1993). The figure of five users is:

(i) about the optimal cost/benefit point within an iterative development cycle, considerably more users are required for summative evaluation or where there is only the opportunity for a single formative evaluation stage;

(ii) an average over a number of projects and needs to be assessed on a system by system basis; and

(iii) based on a number of assumptions, in particular, independence of faults, that are more reasonable for more fully developed systems than for early prototypes, where one fault may mask another.

We’ll look at this in more detail below, but critically, the number ‘5’ is not a panacea, even for formative evaluation.

As important as the kind of question you are asking, are the kind of users you are using. So much of modern psychology is effectively the psychology of first year psychology undergraduates (in the in 1950s it was male prisoners). Is this representative? Does this matter? I’ll return to this at the end, but first of all look briefly at each kind of question.

Finally, there is perhaps the real question “will the reviewers of my work think five users is enough” — good publications vs. good science. The answer is that they will certainly be as influenced by the Myth of Five Users as you are, so do good science … but be prepared to need to educate your reviewers too!

formative evaluation – prototyping cycle

As noted formative evaluation was the scope of Nielsen and Landauer’s early work in 1993 that was then cited by Nielsen in his Alert Box in 2000, and which has now developed mythic status in the field.

The 1993 paper was assuming a context of iterative development where there would be many iterations, and asking how many users should be used per iteration, that is how many users should you test before fixing the problems found by those users, and then performing another cycle of user testing, and another. That is, in all cases they considered, the total number of users involved would be far more than five, it is just the number used in each iteration that was lower.

In order to calculate the optimal number of subjects to use per iteration, they looked at:

(i) the cost of performing a user evaluation

(ii) the number of new faults found (additional users will see many of the same faults, so there are diminishing returns)

(iii) the cost of a redevelopment cycle

All of these differ depending on the kind of project, so Nielsen and Landauer looked at a range of projects of differing levels of complexity. By putting them together, and combining with simple probabilistic models of bug finding in software, you can calculate an optimal number of users per experiment.

They found that, depending on the project, the statistics and costs varied and hence the optimal number of users/evaluators (between 7 and 21), with, on the whole, more complex projects (with more different kinds of bugs and more costly redevelopment cycles) having a higher optimal number than simpler projects. In fact all the numbers are larger than five, but five was the number in Nielsen’s earlier discount engineering paper, so the paper did some additional calculations that yielded a different kind of (lower) optimum (3.2 users — pity the last 0.2 user), with five somewhere between 7 and 3 … and a myth was born!

Today, with Web 2.0 and ‘perpetual beta’, agile methods and continuous deployment reduce redevelopment costs to near zero, and so Twidale and Marty argue for ‘extreme evaluation‘ where often one user may be enough (see also my IwC  paper).

The number also varies through the development process; early on, one user (indeed using it yourself) will find many, many faults that need to be fixed. Later faults become more obscure, or maybe only surface after long-term use.

Of course, if you use some sort of expert or heuristic evaluation, then the answer may be no real users at all!

And anyway all of this is about ‘fault finding’, usability is not about bug fixing but making things better, it is not at all clear how useful, if at all, literature on bug fixing is for creating positive experiences.

summative evaluation – is it good enough to release

If you are faced with a product and want to ask “is it good enough?” (which can mean, “are there any usability ‘faults’?”, or, “do people want to use it?”), then five users is almost certainly not enough. To give yourself any confidence of coverage of the kinds of users and kinds of use situations, you may need tens or hundreds of users, very like hypothesis testing (below).

However, the answer here may also be zero users. If the end product is the result of a continuous evaluation process with one, five or some other number of users per iteration, then the number of users who have seen the product during this process may be sufficient, especially if you are effectively iterating towards a steady state where few or no new faults are found per iteration.

In fact, even when there has been a continuous process, the need for long-term evaluation becomes more critical as the development progresses, and maybe the distinction between summative and late-stage formative is moot.

But in the end there is only one user you need to satisfy — the CEO … ask Apple.

quantitative questions and hypothesis testing

(Note, there are real numbers here, but if you are a numerophobe never fear, the next part will go back to qualitative issues, so bear with it!)

Most researchers know that “five is enough” does not apply in experimental or quantitative studies … but that doesn’t always stop them quoting numbers back!

Happily in dealing with more quantitative questions or precise yes/no ones, we can look to some fairly solid statistical rules for the appropriate number of users for assessing different kinds of effects (but do note “the right kind of users” below). And yes, very, very occasionally five may be enough!

Let’s imagine that our hypothesis is that a behaviour will occur in 50% of users doing an experiment. With five users, the probability that we will see this behaviour in at least one user is 1 in 32, which is approximately 3%. That is if we do not observe the behaviour at all, then we have a statistically significant result at 5% level (p<0.05) and can reject the hypothesis.

Note that there is a crucial difference between a phenomenon that we expect to see in about 50% of user iterations (i.e. the same user will do it about 50% of the time) and one where we expect 50% of people to do it all of the time. The former we can deal with using a small number of users and maybe longer or repeated experiments, the latter needs more users.

If instead, we wanted to show that a behaviour happens less than 25% of the time, then we need at least 11 users, for 10% 29 users. On the other hand, if we hypothesised that a behaviour happens 90% of the time and didn’t see it in just two users we can reject the hypothesis at significance level of 1%. In the extreme if our hypothesis is that something never happens and we see it with just one user, or if the hypothesis is that it always happens and we fail to see it with one user, in both cases we can reject our hypothesis.

The above only pertains when you see samples where either none or all of the users do something. More often we are trying to assess some number. Rather than “does this behaviour occur 50% of the time”, we are asking “how often does this behaviour occur”.

Imagine we have 100 users (a lot more than five!), and notice that 60% do one thing and 40% do the opposite. Can we conclude that in general the first thing is more prevalent? The answer is yes, but only just. Where something is a simple yes/no or either/or choice and we have counted the replies, we have a binomial distribution. If we have n (100) users and the probability of them answering ‘yes’ is p (50% if there is no real preference), then the maths says that the average number of times we expect to see a ‘yes’ response is n x p = 100 x 0.5 = 50 people — fairly obvious. It also says that the standard deviation of this count is sqrt(n x p x (1-p ) ) = sqrt(25) = 5. As a rule of thumb if answers differ by more than 2 standard deviations from the expected value, then this is statistically significant; so 60 ‘yes’ answers vs. the expected 50 is significant at 5%, but 55 would have just been ‘in the noise’.

Now drop this down to 10 users and imagine you have 7 ‘yes’s and 3 ‘no’s. For these users, in this experiment, they answer ‘yes’ more than twice as often as ‘no’, but here this difference is still no more than we might expect by chance. You need at least 8 to 2 before you can say anything more. For five users even 4 to 1 is quite normal (try tossing five coins and see how many come up heads); only if all or none do something can you start to think you are onto something!

For more complex kinds of questions such as “how fast”, rather than “how often”, the statistics becomes a little more complex, and typically more users are needed to gain any level of confidence.

As a rule of thumb some psychologists talk of 20 users per condition, so if you are comparing 4 things then you need 80 users. However,  this is just a rule of thumb and some phenomena have very high variability (e.g. problem solving) whereas others (such as low-level motor actions) are more repeatable for an individual and have more consistency between users. For phenomena with very high variability even 20 users per condition may be too few, although within subjects designs may help if possible. Pilot experiments or previous experiments concerning the same phenomenon are important, but this is probably the time to consult a statistician who can assess the statistical ‘power’ of a suggested design (the likelihood that it will reveal the issue of interest).

qualitative study

Here none of the above applies and … so … well … hmm how do you decide how many users? Often people rely on ‘professional judgement’, which is a posh way of saying “finger in the air”.

In fact, some of the numerical arguments above do still apply (sorry numerophobes). If as part of your qualitative study you are interested in a behaviour that you believe happens about half the time, then with five users you would be very unlucky not to observe it (3% of the time). Or put it another way, if you observe five users you will see around 97% of behaviours that at least half of all users have (with loads and loads of assumptions!).

If you are interested in rarer phenomena, then you need either lots more users (for behaviour that you only see in 1 in 10 users, then you have only a 40% chance of observing it with 5 users, and perhaps more surprisingly, only 65% chance of seeing it with 10 users).

However, if you are interested in a particular phenomenon, then randomly choosing people is not the way to go anyway, you are obviously going to select people who you feel are most likely to exhibit it; the aim is not to assess its prevalence in the world, but to find a few and see what happens.

Crucially when you generalise from qualitative results you do it differently.

Now in fact you will see many qualitative papers that add caveats to say “our results only apply to the group studied …”. This may be necessary to satisfy certain reviewers, but is at best disingenuous – if you really believe the results of your qualitative work do not generalise at all, then why are you publishing it – telling me things that I cannot use?

In fact, we do generalise from qualitative work, with care, noting the particular limitations of the groups studied, but still assume that the results are of use beyond the five, ten or one hundred people that we observed. However, we do not generalise through statistics, or from the raw data, but through reasoning that certain phenomena, even if only observed once, are likely to be ones that will be seen again, even if differing in details. We always generalise from our heads, not from data.

Whether it is one, five or more, by its nature deep qualitative data will involve fewer users than more shallow methods such as large scale experiments or surveys. I often find that the value of this kind of deep interpretative data is enhanced by seeing it alongside large-scale shallow data. For example, if survey or log data reveals that 70% of users have a particular problem and you observe two users having the same problem, then it is not unreasonable to assume that the reasons for the problem are similar to those of the large sample — yes you can generalise from two!

Indeed one user may be sufficient (as often happens with medical case histories, or business case studies), but often it is about getting enough users so that interesting things turn up.

exploratory study

This looking for interesting things is often the purpose of research: finding a problem to tackle. Once we have found an interesting issue, we may address it in many ways: formal experiments, design solutions, qualitative studies; but none of these are possible without something interesting to look at.

In such situations, as we saw with qualitative studies in general, the sole criteria for “is N enough” is whether you have learnt something.

If you want to see all, or most of the common phenomena, then you need lots of users. However, if you just want to find one interesting one, then you only need as many as gets you there. Furthermore whilst you often choose ‘representative or ‘typical’ users (what is a typical user!) for most kinds of study and evaluation, for exploratory analysis, often extreme users are most insightful; of course you have to work out whether your user or users are so unusual that the things you observe are unique to them … but again real research comes from the head, you have to think about it and make an assessment.

In the IwC paper I discuss some of the issues of single person studies in more detail and Fariza Razak’s thesis is all about this.

the right kind of users

If you have five, fifty or five hundred users, but they are all psychology undergraduates, they are not going to tell you much about usage by elderly users, or by young unemployed people who have left school without qualifications.

Again the results of research ultimately come from the head not the data: you will never get a complete typical, or representative sample of users; the crucial thing is to understand the nature of the users you are studying, and to make an assessment of whether the effects you see in them are relevant, and interesting more widely. If you are measuring reaction times, then education may not be a significant factor, but Game Boy use may be.

Many years ago I was approached by a political science PhD student. He had survey data from over 200 people (not just five!), and wanted to know how to calculate error bars to go on his graphs. This was easily done and I explained the procedure (a more systematic version of the short account given earlier). However, I was more interested in the selection of those 200 people. They were Members of Parliament; he had sent the survey to every MP (all 650 of them) and got over 200 replies, a 30% return rate, which is excellent for any survey. However, this was a self-selected group and so I was more interested in whether the grounds for self-selection influenced the answers than in how many of them there were. It is often the case that those with strong views on a topic are more likely to answer surveys on it. The procedure he had used was as good as possible, but, in order to be able to make any sort of statement about the interpretation of the data, he needed to make a judgement. Yet again knowledge is ultimately from the head not the data.

For small numbers of users these choices are far more critical. Do you try and choose a number of similar people, so you can contrast them, or very different so that you get a spread? There is no right answer, but if you imagine having done the study and interpreting the results this can often help you to see the best choice for your circumstances.

being practical

In reality whether choosing how many, or who, to study, we are driven by availability. It is nice to imagine that we make objective selections based on some optimal criteria — but life is not like that. In reality, the number and kind of users we study is determined by the number and kind of users we can recruit. The key thing is to understand the implications of these ‘choices’ and use these in your interpretation.

As a reviewer I would prefer honesty here, to know how and why users were selected so that I can assess the impact of this on the results. But that is a counsel of perfection, and again good science and getting published are not the same thing! Happily there are lovely euphemisms such as ‘convenience sample’ (who I could find) and ‘snowball sample’ (friends of friends, and friends of friends of friends), which allow honesty without insulting reviewers’ academic sensibilities.

in the end

Is five users enough? It depends: one, five, fifty or one thousand (Google test live with millions!). Think about what you want out of the study: numbers, ideas, faults to fix, and the confidence and coverage of issues you are interested in, and let that determine the number.

And, if I’ve not said it enough already, in the end good research comes from your head, from thinking and understanding the users, the questions you are posing, not from the fact that you had five users.

references

A. Dix (2010)  Human-Computer Interaction: a stable discipline, a nascent science, and the growth of the long tail. Interacting with Computers, 22(1) pp. 13-27. http://www.hcibook.com/alan/papers/IwC-LongFsch-HCI-2010/

Nielsen, J. (1989). Usability engineering at a discount. In Salvendy, G., and Smith, M.J. (Eds.), Designing and Using Human–Computer Interfaces and Knowledge Based Systems, Elsevier Science Publishers, Amsterdam. 394-401.

Nielsen, J. and Landauer, T. K. 1993. A mathematical model of the finding of usability problems. In Proceedings of the INTERACT ’93 and CHI ’93 Conference on Human Factors in Computing Systems (Amsterdam, The Netherlands, April 24 ? 29, 1993). CHI ’93. ACM, New York, NY, 206?213. http://doi.acm.org/10.1145/169059.169166

Jakob Nielsen’s Alertbox, March 19, 2000: Why You Only Need to Test With 5 Users. http://www.useit.com/alertbox/20000319.html

Fariza Razak (2008). Single Person Study: Methodological Issues. PhD Thesis. Computing Department, Lancaster University, UK. February 2008. http://www.hcibook.net/people/Fariza/

Michael Twidale and Paul Marty (2004-) Extreme Evaluation.  http://people.lis.uiuc.edu/~twidale/research/xe/

Web Art/Science Camp — how web killed the hypertext star and other stories

Had a great day on Saturday at the at the Web Art/Science Camp (twitter: #webartsci , lanyrd: web-art-science-camp). It was the first event that I went to primarily with my Talis hat on and first Web Science event, so very pleased that Clare Hooper told me about it during the DESIRE Summer School.

The event started on Friday night with a lovely meal in the restaurant at the British Museum. The museum was partially closed in the evening, but in the open galleries Rosetta Stone, Elgin Marbles and a couple of enormous totem poles all very impressive. … and I notice the BM’s website when it describes the Parthenon Sculptures does not waste the opportunity to tell us why they should not be returned to Greece!

Treasury of Atreus

I was fascinated too by images of the “Treasury of Atreus” (which is actually a Greek tomb and also known as the Tomb of Agamemnon. The tomb has a corbelled arch (triangular stepped stones, as visible in the photo) in order to relieve load on the lintel. However, whilst the corbelled arch was an important technological innovation, the aesthetics of the time meant they covered up the triangular opening with thin slabs of fascia stone and made it look as though lintel was actually supporting the wall above — rather like modern concrete buildings with decorative classical columns.

how web killed the hypertext star

On Saturday, the camp proper started with Paul de Bra from TU/e giving a sort of retrospective on pre-web hypertext research and whether there is any need for hypertext research anymore. The talk brought out several of the issues that have worried me also for some time; so many of the lessons of the early hypertext lost in the web1.

For me one of the most significant issues is external linkage. HTML embeds links in the document using <a> anchor tags, so that only the links that the author has thought of can be present (and only one link per anchor). In contrast, mature pre-web hypertext systems, such as Microcosm2, specified links eternally to the document, so that third parties could add annotation and links. I had a few great chats about this with one of the Southampton Web Science DTC students; in particular, about whether Google or Wikipedia effectively provide all the external links one needs.

Paul’s brief history of hypertext started, predictably, with Vannevar Bush‘s  “As We May Think” and Memex; however he pointed out that Bush’s vision was based on associative connections (like the human mind) and trails (a form of narrative), not pairwise hypertext links. The latter reminded me of Nick Hammond’s bus tour metaphor for guided educational hypertext in the 1980s — occasionally since I have seen things a little like this, and indeed narrative was an issue that arose in different guises throughout the day.

While Bush’s trails are at least related to the links of later hypertext and the web, the idea of associative connections seem to have been virtually forgotten.  More recently in the web however, IR (information retrieval) based approaches for page suggestions like Alexa and content-based social networking have elements of associative linking as does the use of spreading activation in web contexts3

It was of course Nelson who coined the term hypertext, but Paul reminded us that Ted Nelson’s vision of hypertext in Xanadu is far richer than the current web.  As well as external linkage (and indeed more complex forms in his ZigZag structures, a form of faceted navigation.), Xanadu’s linking was often in the form of transclusions pieces of one document appearing, quoted, in another. Nelson was particularly keen on having only one copy of anything, hence the transclusion is not so much a copy as a reference to a portion. The idea of having exactly one copy seems a bit of computing obsession, and in non-technical writing it is common to have quotations that are in some way edited (elision, emphasis), but the core thing to me seems to be the fact that the target of a link as well as the source need not be the whole document, but some fragment.

Paul de Bra's keynote at Web Art/Science Camp (photo Clare Hooper)

Over a period 30 years hypertext developed and started to mature … until in the early 1990s came the web and so much of hypertext died with its birth … I guess a bit like the way Java all but stiltified programming languages. Paul had a lovely list of bad things about the web compared with (1990s) state of the art hypertext:

Key properties/limitations in the basic Web:

  1. uni-directional links between single nodes
  2. links are not objects (have no properties of their own)
  3. links are hardwired to their source anchor
  4. only pre-authored link destinations are possible
  5. monolithic browser
  6. static content, limited dynamic content through CGI
  7. links can break
  8. no transclusion of text, only of images

Note that 1, 3 and 4 are all connected with the way that HTML embeds links in pages rather than adopting some form of external linkage. However, 2 is also interesting; the fact that links are not ‘first class objects’. This has been preserved in the semantic web where an RDF triple is not itself easily referenced (except by complex ‘reification’) and so it is hard to add information about relationships such as provenance.

Of course, this same simplicity (or even that it was simplistic) that reduced the expressivity of HTML compared with earlier hypertext is also the reasons for its success compared with earlier more heavy weight and usually centralised solutions.

However, Paul went on to describe how many of the features that were lost have re-emerged in plugins, server enhancements (this made me think of systems such as zLinks, which start to add an element of external linkage). I wasn’t totally convinced as these features are still largely in research prototypes and not entered the mainstream, but it made a good end to the story!

demos and documentation

There was a demo session as well as some short demos as part of talks. Lots’s of interesting ideas. One that particularly caught my eye (although not incredibly webby) was Ana Nelson‘s documentation generator “dexy” (not to be confused with doxygen, another documentation generator). Dexy allows you to include code and output, including screen shots, in documentation (LaTeX, HTML, even Word if you work a little) and live updates the documentation as the code updates (at least updates the code and output, you need to change the words!). It seems to be both a test harness and multi-version documentation compiler all in one!

I recall that many years ago, while he was still at York, Harold Thimbleby was doing something a little similar when he was working on his C version of Knuth’s WEB literate programming system. Ana’s system is language neutral and takes advantage of recent developments, in particular the use of VMs to be able to test install scripts and to be sure to run code in a consistent environments. Also it can use browser automation for web docs — very cool 🙂

Relating back to Paul’s keynote this is exactly an example of Nelson’s transclusion — the code and outputs included in the document but still tied to their original source.

And on this same theme I demoed Snip!t as an example of both:

  1. attempting to bookmark parts of web pages, a form of transclusion
  2. using data detectors a form of external linkage

Another talk/demo also showed how Compendium could be used to annotate video (in the talk regarding fashion design) and build rationale around … yet another example of external linkage in action.

… and when looking after the event at some of Weigang Wang‘s work on collaborative hypermedia it was pleasing to see that it uses a theoretical framework for shared understanding in collaboratuve hypermedia that builds upon my own CSCW framework from the early 1990s 🙂

sessions: narrative, creativity and the absurd

Impossible to capture in a few words, but one session included different talks and discussion about the relation of narrative and various forms of web experiences — including a talk on the cognitive psychology of the Kafkaesque. Also discussion of creativity with Nathan live recording in IBIS!

what is web science

I guess inevitably in a new area there was some discussion about “what is web science” and even “is web science a discipline”. I recall similar discussions about the nature of HCI 25 years ago and not entirely resolved today … and, as an artist who was there reminded us, they still struggle with “what is art?”!

Whether or not there is a well defined discipline of ‘web science’, the web definitely throws up new issues for many disciplines including new challenges for computing in terms of scale, and new opportunities for the social sciences in terms of intrinsically documented social interactions. One of the themes that recurred to distinguish web science from simply web technology is the human element — joy to my ears of course as a HCI man, but I think maybe not the whole story.

Certainly the gathering of people from different backgrounds in a sort of disciplinary bohemia is exciting whether or not it has a definition.

  1. see also “Names, URIs and why the web discards 50 years of computing experience“[back]
  2. Wendy Hall, Hugh Davis and Gerard Hutchings, “Rethinking Hypermedia:: The Microcosm Approach, Springer, 1996.[back]
  3. Spreading activation is used by a number of people, some of my own work with others at Athens, Rome and Talis is reported in “Ontologies and the Brain: Using Spreading Activation through Ontologies to Support Personal Interaction” and “Spreading Activation Over Ontology-Based Resources: From Personal Context To Web Scale Reasoning“.[back]

UK internet far from ubiquitous

On the last page of the Guardian on Saturday (13th Oct) in a sort of ‘interesting numbers’ section, they say that:

“30% of the UK population have no internet access at home”

I couldn’t find the exact source of this, however, another  guardian article “UK internet audience rises by 1.9 million over last year” dated Wednesday 30 June 2010 has a similar figure.  This says that Internet use  has grown to 38.8 million. The National Statistics office say the overall UK population is 61,792,000 with 1/5 under 16, so call that 2 in 16 under 10 or around 8 million. That gives an overall population of a little under 54 million over 10 years old, that is still only 70% actually using the web at all.

My guess is that some of the people with internet at home do not use it, and some of the ones without home connections use it using other means (mobile, use at school, cyber cafe’s), but by both measures we are hardly a society where the web is as ubiquitous as one might have imagined.

Struggling with Heidegger

Heidegger and hammers have been part of HCI’s conceptualisation from pretty much as long as I can recall.  Although maybe I first heard the words at some sort of day workshop in the late 1980s as the hammer example as used in HCI annoyed me even then, so let’s start with hammers.

hammers

I should explain that problems with the hammer example are not my current struggles with Heidegger!  For the hammer it is just that Heidegger’s ‘ready at hand’ is often confused with ‘walk up and use’.  In  Heidegger ready-at-hand refers to the way one is focused on the nail, or wood to be joined, not the hammer itself:

“The work to be produced is the “towards which” of such things as the hammer, the plane, and the needle” (Being and Time1, p.70/99)

To be ‘ready to hand’ like this typically requires familiarity with the equipment (another big Heidegger word!), and is very different from the way a cash machine or tourist information systems should be in some ways accessible independent of prior knowledge (or at least only generic knowledge and skills).

My especial annoyance with the hammer example stems from the fact that my father was a carpenter and I reckon it took me around 10 years to learn how to use a hammer properly2!  Even holding it properly is not obvious, look at the picture.

There is a hand sized depression in the middle.  If you have read Norman’s POET you will think, “ah yes perceptual affordance’, and grasp it like this:

But no that is not the way to hold it!  If try to use it like this you end up using the strength of your arm to knock in the nail and not the weight of the hammer.

Give it to a child, surely the ultimate test of ‘walk up and use’, and they often grasp the head like this.

In fact this is quite sensible for a child as a ‘proper’ grip would put too much strain on their wrist.  Recall  Gibson’s definition of affordance was relational3, about the ecological fit between the object and the potential actions, and the actions depends on who is doing the acting.  For a small child with weaker arms the hammer probably only affords use at all with this grip.

In fact the ‘proper’ grip is to hold it quite near the end where you can use the maximum swing of the hammer to make most use of the weight of the hammer and its angular momentum:

Anyway, I think maybe Heidegger knew this even if many who quote him don’t!

Heidegger

OK, so its alright me complaining about other people mis-using Heidegger, but I am in the middle of writing one of the chapters for TouchIT and so need to make sure I don’t get it wrong myself … and there my struggles begin.  I need to write about ready-to-hand and present-to-hand.   I thought I understood them, but always it has been from secondary sources and as I sat with Being and Time in one hand, my Oxford Companion to Philosophy in another and various other books in my teeth … I began to doubt.

First of all what I thought the distinction was:

  • ready at hand — when you are using the tool and it is invisible to you, you just focus on the work to be done with it
  • present at hand — when there is some sort of breakdown, the hammer head is loose or you don’t have the right tool to hand and so start to focus on the tools themsleves rather than on the job at hand

Scanning the internet this is certainly what others think, for example blog posts at 251 philosophy and Matt Webb at Berg4.  Koschmann, Kuutti and Hickman produced an excellent comparison of breakdown in Heidegger, Leont’ev and Dewey5, and from this it looks as though the above distinction maybe comes Dreyfus summary of Heidegger — but again I don’t have a copy of Dreyfus’ “Being-in-the-World“, so not certain.

Now this is an important distinction, and one that Heidegger certainly makes.  The first part is very clearly what Heidegger means by ready-to-hand:

“The peculiarity of what is proximally to hand is that, in its readiness-to-hand, it must, as it were, withdraw … that with which we concern ourselves primarily is the work …” (B&T, p.69/99)

The second point Heidegger also makes at length distinguishing at least three kinds of breakdown situation.  It just seems a lot less clear whether ‘present-at-hand’ is really the right term for it.  Certainly the ‘present-at-hand’ quality of an artefact becomes foregrounded during breakdown:

“Pure presence at hand announces itself in such equipment, but only to withdraw to the readiness-in-hand with which one concerns oneself — that is to say, of the sort of thing we find when we put it back into repair.” (B&T, p.73/103)

But the preceeding sentance says

“it shows itself as an equipmental Thing which looks so and so, and which, in its readiness-to-hand as looking that way, has constantly been present-at-hand too.” (B&T, p.73/103)

That is present-at-hand is not so much in contrast to ready-at-hand, but in a sense ‘there all along’; the difference is that during breakdown the presence-at-hand becomes foregrounded. Indeed when ‘present-at-hand’ is first introduced Heidegger appears to be using it as a binary distinction between Dasein, (human) entities that exist and ponder their existence, and other entities such as a table, rock or tree (p. 42/67).  The contrast is not so much between ready-to-hand and present-to-hand, but between ready-to-hand and ‘just present-at-hand’ (p.71/101) or ‘Being-just-present-at-hand-and-no-more’ (p.73/103). For Heidegger to seems not so much that ‘ready-to-hand’ stands in in opposition to ‘present-to-hand’; it is just more significant.

To put this in context, traditional philosophy had focused exclusively on the more categorically defined aspects of things as they are in the world (‘existentia’/present-at-hand), whilst ignoring the primary way they are encountered by us (Dasein, real knowing existence) as ready-to-hand, invisible in their purposefulness.  Heidegger seeks to redress this.

“If we look at Things just ‘theoretically’, we can get along without understanding readiness-to-hand.” (B&T p.69/98)

Heidegger wants to avoid the speculation of previous science and philosophy. Although it is not a Heidegger word, I use ‘speculation’ here with all of its connotations, pondering at a distance, but without commitment, or like spectators at a sports stadium looking in at something distant and other.  In contrast, ready-to-hand suggests commitment, being actively ‘in the world’ and even when Heidegger talks about those moments when an entity ceases to be ready-to-hand and is seen as present-to-hand, he uses the term circumspection — a casting of the eye around, so that the Dasein, the person, is in the centre.

So present-at-hand is simply the mode of being of the entities that are not Dasein (aware of their own existence), but our primary mode of experience of them and thus in a sense the essence of their real existence is when they are ready-to-hand.  I note Roderick Munday’s useful “Glossary of Terms in Being and Time” highlights just this broader sense of present-at-hand.

Maybe the confusion arises because Heidegger’s concern is phenomenological and so when an artefact is ready-to-hand and its presence-to-hand ‘withdraws’, in a sense it is no longer present-to-hand as this is no longer a phenomenon; and yet he also seems to hold a foot in realism and so in another sense it is still present-to-hand.  In discussing this tension between realism and idealism in Heidegger, Stepanich6 distinguishes present-at-hand and ready-to-hand, from presence-to-hand and readiness-to-hand — however no-one else does this so maybe that is a little too subtle!

To end this section (almost) with Heidegger’s words, a key statement, often quoted, seems to say precisely what I have argued above, or maybe precisely the opposite:

“Yet only by reason of something present-at-hand ‘is there’ anything ready-to-hand.  Does it follow, however, granting this thesis for the nonce, that readiness-to-hand is ontologically founded upon presence-at-hand?” (B&T, p.71/101)

What sort of philosopher makes a key point through a rhetorical question?

So, for TouchIT, maybe my safest course is to follow the example of the Oxford Companion to Philosophy, which describes ready-to-hand, but circumspectly never mentions present-to-hand at all?

and anyway what’s wrong with …

On a last note there is another confusion, or maybe mistaken attitude, that seems to be common when referring to ready-to-hand.  Heidegger’s concern was in ontology, understanding the nature of being, and so he asserted the ontological primacy of the ready-to-hand, especially in light of the previous dominant concerns of philosophy.  However, in HCI, where we are interested not in the philosophical question, but the pragmatic one of utility, usability, and experience, Heidegger is often misapplied as a kind of fetishism of engagement, as if everything should be ready-to-hand all the time.

Of course for many purposes this is correct, as I type I do not want to be aware of the keys I press, not even of the pages of the book that I turn.

Yet there is also a merit in breaking this engagement, to encourage reflection and indeed the circumspection that Heidegger discusses.  Indeed Gaver et al.’s focus on ambiguity in design7 is often just to encourage that reflection and questioning, bringing things to the foreground that were once background.

Furthermore as HCI practitioners and academics we need to both take seriously the ready-to-hand-ness of effective design, but also (just as Heidegger is doing) actually look at the ready-to-hand-ness of things seeing them and their use not taking them for granted.  I constantly strive to find ways to become aware of the mundane, and offer students tools for estrangement to look at the world askance8.

“To lay bare what is  just present-at-hand and no more, cognition must first penetrate beyond what is ready-to-hand in our concern.” (B&T, p.71/101)

This ability to step out and be aware of what we are doing is precisely the quality that Schon recognises as being critical for the ‘Reflective Practioner‘.  Indeed, my practical advice on using the hammer in the footnotes below comes precisely through reflection on hammering, and breakdowns in hammering, not through the times when the hammer was ready-to-hand..

Heidegger is indeed right that our primary existence is being in the world, not abstractly viewing it from afar.  And yet, sometimes also, just as Heidegger himself did as he pondered and wrote about these issues, one of our crowning glories as human beings is precisely that we are able also in a sense to step outside ourselves and look in wonder.

  1. In common with much of the literature the page references to Being and Time are all of the form p.70/99 where the first number refers to the page numbers in the original German (which I have not read!) and the second number to the page in Macquarrie and Robinson’s translation of Being and Time published by Blackwell.[back]
  2. Practical hammering – a few tips: The key thing is to focus on making sure the face of the hammer is perpendicular to the nail, if there is a slight angle the nail will bend.  For thin oval wire nails, if one does bend do not knock the nail back upright, most likely it will simply bend again and just snap.  Instead, simply hit the head of the nail while still bent, but keeping the hammer face perpendicular to the nail not the hole.  So long as the nail has cut any depth of hole it will simply follow its own path and straighten of its own accord.[back]
  3. James Gibson. The Ecological Approach to Visual Perception[back]
  4. Matt Webb’s post appears to be quoting Paul Dourish’ “Where the Action Is”, but I must have lent my copy to someone, so not sure of this is really what Paul thinks.[back]
  5. Koschmann, T., Kuutti, K. & Hickman, L. (1998). The Concept of Breakdown in Heidegger, Leont’ev, and Dewey and Its Implications for Education. Mind, Culture, and Activity, 5(1), 25-41. doi:10.1207/s15327884mca0501_3[back]
  6. Lambert Stepanich. “Heidegger: Between Idealism and Realism“, The Harvard Review of Philosophy, Vol 1. Spring 1991.[back]
  7. Bill Gaver, Jacob Beaver, and Steve Benford, 2003. Ambiguity as a resource for design. CHI ’03.[back]
  8. see previous posts on “mirrors and estrangement” and “the ordinary and the normal“[back]

Italian conferences: PPD10, AVI2010 and Search Computing

I got back from trip to Rome and Milan last Tuesday, this included the PPD10 workshop that Aaron, Lucia, Sri and I had organised, and the AVI 2008 conference, both in University of Rome “La Sapienza”, and a day workshop on Search Computing at Milan Polytechnic.

PPD10

The PPD10 workshop on Coupled Display Visual Interfaces1 followed on from a previous event, PPD08 at AVI 2008 and also a workshop on “Designing And Evaluating Mobile Phone-Based Interaction With Public Displays” at CHI2008.  The linking of public and private displays is something I’ve been interested in for some years and it was exciting to see some of the kinds of scenarios discussed at Lancaster as potential futures some years ago now being implemented over a range of technologies.  Many of the key issues and problems proposed then are still to be resolved and new ones arising, but certainly it seems the technology is ‘coming of age’.  As well as much work filling in the space of interactions, there were also papers that pushed some of the existing dimensions/classifications, in particular, Rasmus Gude’s paper on “Digital Hospitality” stretched the public/private dimension by considering the appropriation of technology in the home by house guests.  The full proceedings are available at the PPD10 website.

AVI 2010

AVI is always a joy, and AVI 2010 no exception; a biennial, single-track conference with high-quality papers (20% accept rate this year), and always in lovely places in Italy with good food and good company!  I first went to AVI in 1996 when it was in Gubbio to give a keynote “Closing the Loop: modelling action, perception and information“, and have gone every time since — I always say that Stefano Levialdi is a bit like a drug pusher, the first experience for free and ever after you are hooked! The high spot this year was undoubtedly Hitomi Tsujita‘s “Complete fashion coordinator2, a system for using social networking to help choose clothes to wear — partly just fun with a wonderful video, but also a very thoughtful mix of physical and digital technology.


images from Complete Fashion Coordinator

The keynotes were all great, Daniel Keim gave a really lucid state of the art in Visual Analytics (more later) and Patrick Lynch a fresh view of visual understanding based on many years experience and highlighting particularly on some of the more immediate ‘gut’ reactions we have to interfaces.  Daniel Wigdor gave an almost blow-by-blow account of work at Microsoft on developing interaction methods for next-generation touch-based user interfaces.  His paper is a great methodological exemplar for researchers combining very practical considerations, more principled design space analysis and targeted experimentation.

Looking more at the detail of Daniel’s work at Microsoft, it is interesting that he has a harder job than Apple’s interaction developers.  While Apple can design the hardware and interaction together, MS as system providers need to deal with very diverse hardware, leading to a ‘least common denominator’ approach at the level of quite basic touch interactions.  For walk-up-and use systems such as Microsoft Surface in bar tables, this means that users have a consistent experience across devices.  However, I did wonder whether this approach which is basically the presentation/lexical level of Seeheim was best, or whether it would be better to settle at some higher-level primitives more at the Seeheim dialog level, thinking particularly of the way the iPhone turns pull down menus form web pages into spinning selectors.  For devices that people own it maybe that these more device specific variants of common logical interactions allow a richer user experience.

The complete AVI 2010 proceedings (in colour or B&W) can be found at the conference website.

The very last session of AVI was a panel I chaired on “Visual Analytics: people at the heart of data” with Daniel Keim, Margit Pohl, Bob Spence and Enrico Bertini (in the order they sat at the table!).  The panel was prompted largely because the EU VisMaster Coordinated Action is producing a roadmap document looking at future challenges for visual analytics research in Europe and elsewhere.  I had been worried that it could be a bit dead at 5pm on the last day of the conference, but it was a lively discussion … and Bob served well as the enthusiastic but also slightly sceptical outsider to VisMaster!

As I write this, there is still time (just, literally weeks!) for final input into the VisMaster roadmap and if you would like a draft I’ll be happy to send you a PDF and even happier if you give some feedback 🙂

Search Computing

I was invited to go to this one-day workshop and had the joy to travel up on the train from Rome with Stu Card and his daughter Gwyneth.

The search computing workshop was organised by the SeCo project. This is a large single-site project (around 25 people for 5 years) funded as one of the EU’s ‘IDEAS Advanced Grants’ supporting ‘investigation-driven frontier research’.  Really good to see the EU funding work at the bleeding edge as so many national and European projects end up being ‘safe’.

The term search computing was entirely new to me, although instantly brought several concepts to mind.  In fact the principle focus of SeCo is the bringing together of information in deep web resources including combining result rankings; in database terms a form of distributed join over heterogeneous data sources.

The work had many personal connections including work on concept classification using ODP data dating back to aQtive days as well as onCue itself and Snip!t.  It also has similarities with linked data in the semantic web word, however with crucial differences.  SeCo’s service approach uses meta-descriptions of the services to add semantics, whereas linked data in principle includes a degree of semantics in the RDF data.  Also the ‘join’ on services is on values and so uses a degree of run-time identity matching (Stu Card’s example was how to know that LA=’Los Angeles’), whereas linked data relies on URIs so (again in principle) matching has already been done during data preparation.  My feeling is that the linking of the two paradigms would be very powerful, and even for certain kinds of raw data, such as tables, external semantics seems sensible.

One of the real opportunities for both is to harness user interaction with data as an extra source of semantics.  For example, for the identity matching issue, if a user is linking two data sources and notices that ‘LA’ and ‘Los Angeles’ are not identified, this can be added as part of the interaction to serve the user’s own purposes at that time, but by so doing adding a special case that can be used for the benefit of future users.

While SeCo is predominantly focused on the search federation, the broader issue of using search as part of algorithmics is also fascinating.  Traditional algorithmics assumes that knowledge is basically in code or rules and is applied to data.  In contrast we are seeing the rise of web algorithmics where knowledge is garnered from vast volumes of data.  For example, Gianluca Demartini at the workshop mentioned that his group had used the Google suggest API to extend keywords and I’ve seen the same trick used previously3.  To some extent this is like classic techniques of information retrieval, but whereas IR is principally focused on a closed document set, here the document set is being used to establish knowledge that can be used elsewhere.  In work I’ve been involved with, both the concept classification and folksonomy mining with Alessio apply this same broad principle.

The slides from the workshop are appearing (but not all there yet!) at the workshop web page on the SeCo site.

  1. yes I know this doesn’t give ‘PPD’ this stands for “public and private displays”[back]
  2. Hitomi Tsujita, Koji Tsukada, Keisuke Kambara, Itiro Siio, Complete Fashion Coordinator: A support system for capturing and selecting daily clothes with social network, Proceedings of the Working Conference on Advanced Visual Interfaces (AVI2010), pp.127–132.[back]
  3. The Yahoo! Related Suggestions API offers a similar service.[back]

Apple’s Model-View-Controller is Seeheim

Just reading the iPhone Cocoa developer docs and its description of Model-View-Controller. However, if you look at the diagram rather than the model component directly notifying the view of changes as in classic MVC, in Cocoa the controller acts as mediator, more like the Dialogue component in the Seeheim architecture1 or the Control component in PAC.

MVC from Mac Cocoa development docs

The docs describing the Cocoa MVC design pattern in more detail in fact do a detailed comparison with the Smalltalk MVC, but do not refer to Seeheim or PAC, I guess because they are less well known now-a-days.  Only a few weeks ago when discussing architecture with my students, I described Seeheim as being more a conceptual architecture and not used in actual implementations now.  I will have to update my lectures – Seeheim lives!

  1. Shocked to find no real web documentation for Seeheim, not even on Wikipedia; looks like CS memory is short.  However, it is described in chapter 8 of the HCI book and in the chapter 8 slides[back]

a new version of … on downgrades and preferences

I’m wondering why people break things when they create new versions.

Firefox used to open a discreet little window when you downloaded papers.  Now-a-days it opens a full screen window completely hiding the browser.

A minor issue, but makes me wonder about both new versions and also defaults and personalisation in general.

Continue reading

Crash Report

You would think crash reporting would be made as seamless and helpful as possible, after all your product has just failed in some way and you wish (a) to mollify the user; and (b) to solicit their assistance in obtaining a full report.

You would think …

In the following I will reflect on what goes wrong in Adobe’s crash reporting and some  lessons we can learn from it.

Continue reading

Language and Action (2): from observation to communication

Years ago I wrote a short CHI paper with Roberta Mancini and Stefano Levialdi “communication, action and history” all about the differences between language and action, but for the second time in a few weeks I am writing about the links. But of course there are both similarities and differences.

In my recent post about “language and action: sequential associative parsing“, I compared the role of semantics in the parsing of language with the similar role semantics plays in linking disparate events in our interpretation of the world and most significantly the actions of others. The two differ however in that language is deliberative, intentionally communicative, and hence has a structure, a rule-iness resulting from conventions; it is chosen to make it easier for the recipient to interpret. In contrast, the events of the world have structure inherent in their physical nature, but do not structure themselves in order that we may interpret them, their rule-iness is inherent not intentional. However, the actions of other people and animals often fall between the two.

In this post I will focus in on individual actions of creatures in the world and the way that observing others tells us about their current activities and even their intended actions, and thus how these observations becomes a resource for planning our own actions. However, our own actions are also the subject of observation and hence available to others. We may deliberately hide or obfuscate our intentions and actions if we do not wish others to ‘read’ what we are doing; however, we may also exaggerate them, making them more obvious when we are collaborating. That is, we shape our actions in the light of their potential observation by others so that they become an explicit communication to them.

This exaggeration is evident in computer environments and the physical world, and may even be the roots of iconic gesture and hence language itself.

Continue reading