Are five users enough?

[An edited version of this post is reproduced at HCIbook online!]

I recently got a Facebook message from a researcher about to do a study who asked, “Do you think 5 (users) is enough?”

Along with Fitts’ Law and Miller’s 7+/-2 this “five is enough” must be among the most widely cited, and most widely misunderstood and misused aphorisms of HCI and usability. Indeed, I feel that this post belongs more in ‘Myth Busters” than in my blog.

So, do I think five is enough? Of course, the answer is (annoyingly), “it depends”, sometimes, yes, five is enough, but sometimes, fewer: one appropriate user, or maybe no users at all, and sometimes you need more, maybe many more: 20, 100, 1000. But even when five is enough, the reasons are usually not directly related to Nielsen and Landauer’s original paper, which people often cite (although rarely read) and Nielsen’s “Why You Only Need to Test With 5 Users” alert box (probably more often actually read … well at least the title).

The “it depends” is partly dependent on the particular circumstances of the situation (types of people involved, kind of phenomenon, etc.), and partly on the kind of question you want to ask. The latter is the most critical issue, as if you don’t know what you want to know, how can you know how many users you need?

There are several sorts of reasons for doing some sort of user study/experiment, several of which may apply:

1. To improve a user interface (formative evaluation)

2. To assess whether it is good enough (summative evaluation)

3. To answer some quantitative question such as “what % of users are able to successfully complete this task”

4. To verify or refute some hypothesis such as “does eye gaze follow Fitts’ Law”

5. To perform a broad qualitative investigation of an area

6. To explore some domain or the use of a product in order to gain insight

It is common to see HCI researchers very confused about these distinctions, and effectively perform formative/summative evaluation in research papers (1 or 2) where in fact one of 3-6 is what is really needed.

I’ll look at each in turn, but first to note that, to the extent there is empirical evidence for ‘five is enough”, it applies to the first of these only.

I dealt with this briefly in my paper “Human–Computer Interaction: a stable discipline, a nascent science, and the growth of the long tail” in the John Long Festschrift edition of IwC, and quote here:

In the case of the figure of five users, this was developed based on a combination of a mathematical model and empirical results (Nielsen and Landauer 1993). The figure of five users is:

(i) about the optimal cost/benefit point within an iterative development cycle, considerably more users are required for summative evaluation or where there is only the opportunity for a single formative evaluation stage;

(ii) an average over a number of projects and needs to be assessed on a system by system basis; and

(iii) based on a number of assumptions, in particular, independence of faults, that are more reasonable for more fully developed systems than for early prototypes, where one fault may mask another.

We’ll look at this in more detail below, but critically, the number ‘5’ is not a panacea, even for formative evaluation.

As important as the kind of question you are asking, are the kind of users you are using. So much of modern psychology is effectively the psychology of first year psychology undergraduates (in the in 1950s it was male prisoners). Is this representative? Does this matter? I’ll return to this at the end, but first of all look briefly at each kind of question.

Finally, there is perhaps the real question “will the reviewers of my work think five users is enough” — good publications vs. good science. The answer is that they will certainly be as influenced by the Myth of Five Users as you are, so do good science … but be prepared to need to educate your reviewers too!

formative evaluation – prototyping cycle

As noted formative evaluation was the scope of Nielsen and Landauer’s early work in 1993 that was then cited by Nielsen in his Alert Box in 2000, and which has now developed mythic status in the field.

The 1993 paper was assuming a context of iterative development where there would be many iterations, and asking how many users should be used per iteration, that is how many users should you test before fixing the problems found by those users, and then performing another cycle of user testing, and another. That is, in all cases they considered, the total number of users involved would be far more than five, it is just the number used in each iteration that was lower.

In order to calculate the optimal number of subjects to use per iteration, they looked at:

(i) the cost of performing a user evaluation

(ii) the number of new faults found (additional users will see many of the same faults, so there are diminishing returns)

(iii) the cost of a redevelopment cycle

All of these differ depending on the kind of project, so Nielsen and Landauer looked at a range of projects of differing levels of complexity. By putting them together, and combining with simple probabilistic models of bug finding in software, you can calculate an optimal number of users per experiment.

They found that, depending on the project, the statistics and costs varied and hence the optimal number of users/evaluators (between 7 and 21), with, on the whole, more complex projects (with more different kinds of bugs and more costly redevelopment cycles) having a higher optimal number than simpler projects. In fact all the numbers are larger than five, but five was the number in Nielsen’s earlier discount engineering paper, so the paper did some additional calculations that yielded a different kind of (lower) optimum (3.2 users — pity the last 0.2 user), with five somewhere between 7 and 3 … and a myth was born!

Today, with Web 2.0 and ‘perpetual beta’, agile methods and continuous deployment reduce redevelopment costs to near zero, and so Twidale and Marty argue for ‘extreme evaluation‘ where often one user may be enough (see also my IwC  paper).

The number also varies through the development process; early on, one user (indeed using it yourself) will find many, many faults that need to be fixed. Later faults become more obscure, or maybe only surface after long-term use.

Of course, if you use some sort of expert or heuristic evaluation, then the answer may be no real users at all!

And anyway all of this is about ‘fault finding’, usability is not about bug fixing but making things better, it is not at all clear how useful, if at all, literature on bug fixing is for creating positive experiences.

summative evaluation – is it good enough to release

If you are faced with a product and want to ask “is it good enough?” (which can mean, “are there any usability ‘faults’?”, or, “do people want to use it?”), then five users is almost certainly not enough. To give yourself any confidence of coverage of the kinds of users and kinds of use situations, you may need tens or hundreds of users, very like hypothesis testing (below).

However, the answer here may also be zero users. If the end product is the result of a continuous evaluation process with one, five or some other number of users per iteration, then the number of users who have seen the product during this process may be sufficient, especially if you are effectively iterating towards a steady state where few or no new faults are found per iteration.

In fact, even when there has been a continuous process, the need for long-term evaluation becomes more critical as the development progresses, and maybe the distinction between summative and late-stage formative is moot.

But in the end there is only one user you need to satisfy — the CEO … ask Apple.

quantitative questions and hypothesis testing

(Note, there are real numbers here, but if you are a numerophobe never fear, the next part will go back to qualitative issues, so bear with it!)

Most researchers know that “five is enough” does not apply in experimental or quantitative studies … but that doesn’t always stop them quoting numbers back!

Happily in dealing with more quantitative questions or precise yes/no ones, we can look to some fairly solid statistical rules for the appropriate number of users for assessing different kinds of effects (but do note “the right kind of users” below). And yes, very, very occasionally five may be enough!

Let’s imagine that our hypothesis is that a behaviour will occur in 50% of users doing an experiment. With five users, the probability that we will see this behaviour in at least one user is 1 in 32, which is approximately 3%. That is if we do not observe the behaviour at all, then we have a statistically significant result at 5% level (p<0.05) and can reject the hypothesis.

Note that there is a crucial difference between a phenomenon that we expect to see in about 50% of user iterations (i.e. the same user will do it about 50% of the time) and one where we expect 50% of people to do it all of the time. The former we can deal with using a small number of users and maybe longer or repeated experiments, the latter needs more users.

If instead, we wanted to show that a behaviour happens less than 25% of the time, then we need at least 11 users, for 10% 29 users. On the other hand, if we hypothesised that a behaviour happens 90% of the time and didn’t see it in just two users we can reject the hypothesis at significance level of 1%. In the extreme if our hypothesis is that something never happens and we see it with just one user, or if the hypothesis is that it always happens and we fail to see it with one user, in both cases we can reject our hypothesis.

The above only pertains when you see samples where either none or all of the users do something. More often we are trying to assess some number. Rather than “does this behaviour occur 50% of the time”, we are asking “how often does this behaviour occur”.

Imagine we have 100 users (a lot more than five!), and notice that 60% do one thing and 40% do the opposite. Can we conclude that in general the first thing is more prevalent? The answer is yes, but only just. Where something is a simple yes/no or either/or choice and we have counted the replies, we have a binomial distribution. If we have n (100) users and the probability of them answering ‘yes’ is p (50% if there is no real preference), then the maths says that the average number of times we expect to see a ‘yes’ response is n x p = 100 x 0.5 = 50 people — fairly obvious. It also says that the standard deviation of this count is sqrt(n x p x (1-p ) ) = sqrt(25) = 5. As a rule of thumb if answers differ by more than 2 standard deviations from the expected value, then this is statistically significant; so 60 ‘yes’ answers vs. the expected 50 is significant at 5%, but 55 would have just been ‘in the noise’.

Now drop this down to 10 users and imagine you have 7 ‘yes’s and 3 ‘no’s. For these users, in this experiment, they answer ‘yes’ more than twice as often as ‘no’, but here this difference is still no more than we might expect by chance. You need at least 8 to 2 before you can say anything more. For five users even 4 to 1 is quite normal (try tossing five coins and see how many come up heads); only if all or none do something can you start to think you are onto something!

For more complex kinds of questions such as “how fast”, rather than “how often”, the statistics becomes a little more complex, and typically more users are needed to gain any level of confidence.

As a rule of thumb some psychologists talk of 20 users per condition, so if you are comparing 4 things then you need 80 users. However,  this is just a rule of thumb and some phenomena have very high variability (e.g. problem solving) whereas others (such as low-level motor actions) are more repeatable for an individual and have more consistency between users. For phenomena with very high variability even 20 users per condition may be too few, although within subjects designs may help if possible. Pilot experiments or previous experiments concerning the same phenomenon are important, but this is probably the time to consult a statistician who can assess the statistical ‘power’ of a suggested design (the likelihood that it will reveal the issue of interest).

qualitative study

Here none of the above applies and … so … well … hmm how do you decide how many users? Often people rely on ‘professional judgement’, which is a posh way of saying “finger in the air”.

In fact, some of the numerical arguments above do still apply (sorry numerophobes). If as part of your qualitative study you are interested in a behaviour that you believe happens about half the time, then with five users you would be very unlucky not to observe it (3% of the time). Or put it another way, if you observe five users you will see around 97% of behaviours that at least half of all users have (with loads and loads of assumptions!).

If you are interested in rarer phenomena, then you need either lots more users (for behaviour that you only see in 1 in 10 users, then you have only a 40% chance of observing it with 5 users, and perhaps more surprisingly, only 65% chance of seeing it with 10 users).

However, if you are interested in a particular phenomenon, then randomly choosing people is not the way to go anyway, you are obviously going to select people who you feel are most likely to exhibit it; the aim is not to assess its prevalence in the world, but to find a few and see what happens.

Crucially when you generalise from qualitative results you do it differently.

Now in fact you will see many qualitative papers that add caveats to say “our results only apply to the group studied …”. This may be necessary to satisfy certain reviewers, but is at best disingenuous – if you really believe the results of your qualitative work do not generalise at all, then why are you publishing it – telling me things that I cannot use?

In fact, we do generalise from qualitative work, with care, noting the particular limitations of the groups studied, but still assume that the results are of use beyond the five, ten or one hundred people that we observed. However, we do not generalise through statistics, or from the raw data, but through reasoning that certain phenomena, even if only observed once, are likely to be ones that will be seen again, even if differing in details. We always generalise from our heads, not from data.

Whether it is one, five or more, by its nature deep qualitative data will involve fewer users than more shallow methods such as large scale experiments or surveys. I often find that the value of this kind of deep interpretative data is enhanced by seeing it alongside large-scale shallow data. For example, if survey or log data reveals that 70% of users have a particular problem and you observe two users having the same problem, then it is not unreasonable to assume that the reasons for the problem are similar to those of the large sample — yes you can generalise from two!

Indeed one user may be sufficient (as often happens with medical case histories, or business case studies), but often it is about getting enough users so that interesting things turn up.

exploratory study

This looking for interesting things is often the purpose of research: finding a problem to tackle. Once we have found an interesting issue, we may address it in many ways: formal experiments, design solutions, qualitative studies; but none of these are possible without something interesting to look at.

In such situations, as we saw with qualitative studies in general, the sole criteria for “is N enough” is whether you have learnt something.

If you want to see all, or most of the common phenomena, then you need lots of users. However, if you just want to find one interesting one, then you only need as many as gets you there. Furthermore whilst you often choose ‘representative or ‘typical’ users (what is a typical user!) for most kinds of study and evaluation, for exploratory analysis, often extreme users are most insightful; of course you have to work out whether your user or users are so unusual that the things you observe are unique to them … but again real research comes from the head, you have to think about it and make an assessment.

In the IwC paper I discuss some of the issues of single person studies in more detail and Fariza Razak’s thesis is all about this.

the right kind of users

If you have five, fifty or five hundred users, but they are all psychology undergraduates, they are not going to tell you much about usage by elderly users, or by young unemployed people who have left school without qualifications.

Again the results of research ultimately come from the head not the data: you will never get a complete typical, or representative sample of users; the crucial thing is to understand the nature of the users you are studying, and to make an assessment of whether the effects you see in them are relevant, and interesting more widely. If you are measuring reaction times, then education may not be a significant factor, but Game Boy use may be.

Many years ago I was approached by a political science PhD student. He had survey data from over 200 people (not just five!), and wanted to know how to calculate error bars to go on his graphs. This was easily done and I explained the procedure (a more systematic version of the short account given earlier). However, I was more interested in the selection of those 200 people. They were Members of Parliament; he had sent the survey to every MP (all 650 of them) and got over 200 replies, a 30% return rate, which is excellent for any survey. However, this was a self-selected group and so I was more interested in whether the grounds for self-selection influenced the answers than in how many of them there were. It is often the case that those with strong views on a topic are more likely to answer surveys on it. The procedure he had used was as good as possible, but, in order to be able to make any sort of statement about the interpretation of the data, he needed to make a judgement. Yet again knowledge is ultimately from the head not the data.

For small numbers of users these choices are far more critical. Do you try and choose a number of similar people, so you can contrast them, or very different so that you get a spread? There is no right answer, but if you imagine having done the study and interpreting the results this can often help you to see the best choice for your circumstances.

being practical

In reality whether choosing how many, or who, to study, we are driven by availability. It is nice to imagine that we make objective selections based on some optimal criteria — but life is not like that. In reality, the number and kind of users we study is determined by the number and kind of users we can recruit. The key thing is to understand the implications of these ‘choices’ and use these in your interpretation.

As a reviewer I would prefer honesty here, to know how and why users were selected so that I can assess the impact of this on the results. But that is a counsel of perfection, and again good science and getting published are not the same thing! Happily there are lovely euphemisms such as ‘convenience sample’ (who I could find) and ‘snowball sample’ (friends of friends, and friends of friends of friends), which allow honesty without insulting reviewers’ academic sensibilities.

in the end

Is five users enough? It depends: one, five, fifty or one thousand (Google test live with millions!). Think about what you want out of the study: numbers, ideas, faults to fix, and the confidence and coverage of issues you are interested in, and let that determine the number.

And, if I’ve not said it enough already, in the end good research comes from your head, from thinking and understanding the users, the questions you are posing, not from the fact that you had five users.

references

A. Dix (2010)  Human-Computer Interaction: a stable discipline, a nascent science, and the growth of the long tail. Interacting with Computers, 22(1) pp. 13-27. http://www.hcibook.com/alan/papers/IwC-LongFsch-HCI-2010/

Nielsen, J. (1989). Usability engineering at a discount. In Salvendy, G., and Smith, M.J. (Eds.), Designing and Using Human–Computer Interfaces and Knowledge Based Systems, Elsevier Science Publishers, Amsterdam. 394-401.

Nielsen, J. and Landauer, T. K. 1993. A mathematical model of the finding of usability problems. In Proceedings of the INTERACT ’93 and CHI ’93 Conference on Human Factors in Computing Systems (Amsterdam, The Netherlands, April 24 ? 29, 1993). CHI ’93. ACM, New York, NY, 206?213. http://doi.acm.org/10.1145/169059.169166

Jakob Nielsen’s Alertbox, March 19, 2000: Why You Only Need to Test With 5 Users. http://www.useit.com/alertbox/20000319.html

Fariza Razak (2008). Single Person Study: Methodological Issues. PhD Thesis. Computing Department, Lancaster University, UK. February 2008. http://www.hcibook.net/people/Fariza/

Michael Twidale and Paul Marty (2004-) Extreme Evaluation.  http://people.lis.uiuc.edu/~twidale/research/xe/

the ordinary and the normal

I am reading Michel de Certeau’s “The Practice of Everyday Life“.  The first chapter begins:

The Practice of Everyday Life (cover image)The erosion and denigration of the singular or the extraordinary was announced by The Man Without Qualities1: “…a heroism but enormous and collective, in the model of ants” And indeed the advent of the anthill society began with the masses, … The tide rose. Next it reached the managers … and finally it invaded the liberal professions that thought themselves protected against it, including even men of letters and artists.”

Now I have always hated the word ‘normal’, although loved the ‘ordinary’.  This sounds contradictory as they mean almost the same, but the words carry such different connotations. If you are not normal you are ‘subnormal’ or ‘abnormal’, either lacking in something or perverted.  To be normal is to be normalised, to be part of the crowd, to obey the norms, but to be distinctive or different is wrong.  Normal is fundamentally fascist.

In contrast the ordinary does not carry the same value judgement.  To be different from ordinary is to be extra-ordinary2, not sub-ordinary or ab-ordinary.  Ordinariness does not condemn otherness.

Certeau is studying the everyday.  The quote is ultimately about the apparently relentless rise of the normal over the ordinary, whereas Certeau revels in  the small ways ordinary people subvert norms and create places within the interstices of the normal.

The more I study the ordinary, the mundane, the quotidian, the more I discover how extraordinary is the everyday3. Both the ethnographer and the comedian are expert at making strange, taking up the things that are taken for granted and holding them for us to see, as if for the first time. Walk down an anodyne (normalised) shopping street, and then look up from the facsimile store fronts and suddenly cloned city centres become architecturally unique.  Then look through the crowd and amongst the myriad incidents and lives around, see one at a time, each different.

Sometimes it seems as if the world conspires to remove this individuality. The InfoLab21 building that houses the Computing Dept. at Lancaster was sort listed for a people-centric design award of ‘best corporate workspace‘.  Before the judging we had to remove any notices from doors or any other sign that the building was occupied, nothing individual, nothing ordinary, sanitised, normalised.

However, all is not lost.  I was really pleased the other day to see a paper  “Making Place for Clutter and Other Ideas of Home4. Laural, Alex and Richard are looking at the way people manage the clutter in their homes: keys in bowls to keep them safe, or bowls on a worktop ready to be used.  They are looking at the real lives of ordinary people, not the normalised homes of design magazines, where no half-drunk coffee cup graces the coffee table, nor the high-tech smart homes where misplaced papers will confuse the sensors.

Like Fariza’s work on designing for one person5, “Making a Place for Clutter” is focused on single case studies not broad surveys.  It is not that the data one gets from broader surveys and statistics is not important (I am a mathematician and a statistician!), but read without care the numbers can obscure the individual and devalue the unique.  I heard once that Stalin said, “a million dead in Siberia is a statistic, but one old woman killed crossing the road is a national disaster”. The problem is that he could not see that each of the million was one person too. “Aren’t two sparrows sold for only a penny? But your Father knows when any one of them falls to the ground.”6.

We are ordinary and we are special.

  1. The Man without Qualities, Robert Musil, 1930-42, originally: Der Mann ohne Eigenschafte. Picador Edition 1997, Trans.  Sophie Wilkins and  Burton Pike: Amazon | Wikipedia[back]
  2. Sometimes ‘extraordinary’ may be ‘better than’, but more often simply ‘different from’, literally the Latin ‘extra’ = ‘outside of’[back]
  3. as in my post about the dinosaur joke![back]
  4. Swan, L., Taylor, A. S., and Harper, R. 2008. Making place for clutter and other ideas of home. ACM Trans. Comput.-Hum. Interact. 15, 2 (Jul. 2008), 1-24. DOI= http://doi.acm.org/10.1145/1375761.1375764[back]
  5. Described in Fariza’s thesis: Single Person Study: Methodological Issues and in the notes of my SIGCHI Ireland Inaugural Lecture Human-Computer Interaction in the early 21st century: a stable discipline, a nascent science, and the growth of the long tail.[back]
  6. Matthew 10:29[back]

Touching Technology

I’ve given a number of talks over recent months on aspects of physicality, twice during winter schools in Switzerland and India that I blogged about (From Anzere in the Alps to the Taj Bangelore in two weeks) a month or so back, and twice during my visit to Athens and Tripolis a few weeks ago.

I have finished writing up the notes of the talks as “Touching Technology: taking the physical world seriously in digital design“.  The notes  are partly a summary of material presented in previous papers and also some new material.  Here is the abstract:

Although we live in an increasingly digital world, our bodies and minds are designed to interact with the physical. When designing purely physical artefacts we do not need to understand how their physicality makes them work – they simply have it. However, as we design hybrid physical/digital products, we must now understand what we lose or confuse by the added digitality. With two and half millennia of philosophical ponderings since Plato and Aristotle, several hundred years of modern science, and perhaps one hundred and fifty years of near modern engineering – surely we know sufficient about the physical for ordinary product design? While this may be true of the physical properties themselves, it is not the fact for the way people interact with and rely on those properties. It is only when the nature of physicality is perturbed by the unusual and, in particular the digital, that it becomes clear what is and is not central to our understanding of the world. This talk discusses some of the obvious and not so obvious properties that make physical objects different from digital ones. We see how we can model the physical aspects of devices and how these interact with digital functionality.

After finishing typing up the notes I realised I have become worryingly scholarly – 59 references and it is just notes of the talk!

Alan looking scholarly

Alan looking scholarly

Searle’s wall, computation and representation

Reading a bit more of Brain Cantwell Smith’s “On the Origin of Objects”  and he refers (p.30-31) to Searle‘s wall that, according to Searle, can be interpreted as implementing a word processor.  This all hinges on predicates introduced by Goodman such as ‘grue’, meaning “green is examined before time t or blue if examined after”:

grue(x) = if ( now() < t ) green(x)
          else blue(x)

The problem is that an emerald apparently changes state from grue to not grue at time t, without any work being done.  Searle’s wall is just an extrapolation of this so that you can interpret the state of the wall at a time to be something arbitrarily complex, but without it ever changing at all.

This issue of the fundamental nature of computation has long seemed to me the ‘black hole’ at the heart of our discipline (I’ve alluded to this before in “What is Computing?“).  Arguably we don’t understand information much either, but at least we can measure it – we have a unit, the bit; but with computation we cannot even measure except without reference to specific implementation architecture whether Turing machine or Intel Core.  Common sense (or at least programmer’s common sense) tells us that any given computational device has only so much computational ‘power’ and that any problem has a minimum amount of computational effort needed to solve it, but we find it hard to quantify precisely.  However,  by Searle’s argument we can do arbitrary amounts of computation with a brick wall.

For me, a defining moment came about 10 years ago, I recall I was in Loughbrough for an examiner’s meeting and clearly looking through MSc scripts had lost it’s thrill as I was daydreaming about computation (as one does).  I was thinking about the relationship between computation and representation and in particular the fast (I think fastest) way to do multiplication of very large numbers, the Schönhage–Strassen algorithm.

If you’ve not come across this, the algorithm hinges on the fact that multiplication is a form of convolution (sum of a[i] * b[n-i]) and a Fourier transform converts convolution into pointwise multiplication  (simply a[i] * b[i]). The algorithm looks something like:

1. represent numbers, a and b, in base B (for suitable B)
2. perform FFT in a and b to give af and bf
3. perform pointwise multiplication on af and bf to give cf
4. perform inverse FFT on cf to give cfi
5. tidy up cfi a but doing carries etc. to give c
6. c is the answer (a*b) in base B

In this the heart of the computation is the pointwise multiplication at step 3, this is what ‘makes it’ multiplication.  However, this is a particularly extreme case where the change of representation (steps 2 and 4) makes the computation easier. What had been a quadratic O(N2) convolution is now a linear O(N) number of pointwise multiplications (strictly O(n) where n = N/log(B) ). This change of representation is in fact so extreme, that now the ‘real work’ of the algorithm in step 3 takes significantly less time (O(n) multiplications) compared to the change in representation at steps 2 and 4 (FFT is O( n log(n) ) multiplications).

Forgetting the mathematics this means the majority of the computational time in doing this multiplication is taken up by the change of representation.

In fact, if the data had been presented for multiplication already in FFT form and result expected in FFT representation, then the computational ‘cost’ of multiplication would have been linear … or to be even more extreme if instead of ‘representing’ two numbers as a and b we instead ‘represent’ them as a*b and a/b, then multiplication is free.  In general, computation lies as much in the complexity of putting something into a representation as it is in the manipulation of it once it is represented.  Computation is change of representation.

In a letter to CACM in 1966 Knuth said1:

When a scientist conducts an experiment in which he is measuring the value of some quantity, we have four things present, each of which is often called “information”: (a) The true value of the quantity being measured; (b) the approximation to this true value that is actually obtained by the measuring device; (c) the representation of the value (b) in some formal language; and (d) the concepts learned by the scientist from his study of the measurements. It would seem that the word “data” would be most appropriately applied to (c), and the word “information” when used in a technical sense should be further qualified by stating what kind of information is meant.

In these terms problems are about information, whereas algorithms are operating on data … but the ‘cost’ of computation has to also include the cost of turning information into data and back again.

Back to Searle’s wall and the Goodman’s emerald.  The emerald ‘changes’ state from grue to not grue with no cost or work, but in order to ask the question “is this emerald grue?” the answer will involve computation (if (now()<t) …).  Similarly if we have rules like this, but so complicated that Searle’s wall ‘implements’ a word processor, that is fine, but in order to work out what is on the word processor ‘screen’ based on the observation of the (unchanging) wall, the computation involved in making that observation would be equivalent to running the word processor.

At a theoretical computation level this reminds us that when we look at the computation in a Turing machine, vs. an Intel processor or lambda calculus, we need to consider the costs of change of representations between them.  And at a practical level, we all know that 90% of the complexity of any program is in the I/O.

  1. Donald Knuth, “Algorithm and Program; Information and Data”, Letters to the editor. Commun. ACM 9, 9, Sep. 1966, 653-654. DOI= http://doi.acm.org/10.1145/365813.858374 [back]

making life easier – quick filing in visible folders

It is one of those things that has bugged me for years … and if it was right I would probably not even notice it was there – such is the nature of good design, but …  when I am saving a file from an application and I already have a folder window open, why is it not easier to select the open folder as the destination.

A scenario: I have just been writing a reference for a student and have a folder for the references open on my desktop. I select “Save As …” from the Word menu and get a file selection dialogue, but I have to navigate through my hard disk to find the folder even though I can see it right in front of me (and I have over 11000 folders, so it does get annoying).

The solution to this is easy, some sort of virtual folder at the top level of the file tree labelled “Open Folders …” that contains a list of the curently open folder windows in the finder.  Indeed for years I instinctively clicked on the ‘Desktop’ folder expecting this to contain the open windows, but of course this just refers to the various aliases and files permamently on the desktop background, not the open windows I can see in front of me.

In fact as Mac OSX is built on top of UNIX there is an easy very UNIX-ish fix (or maybe hack), the Finder could simply maintain an actual folder (probably on the desktop) called “Finder Folders” and add aliases to folders as you navigate.  Although less in the spirit of Windows, this would certainly be possible there too and of course any of the LINUX based systems.  … so OS developers out there “fix it”, it is easy.

So why is it that this is a persistent and annoying problem and has an easy fix, and yet is still there in every system I have used after 30 years of windowing systems?

First, it is annoying and persistent, but does not stop you getting things done, it is about efficiency but not a ‘bug’ … and system designers love to say, “but it can do X”, and then send flying fingers over the keyboard to show you just how.  So it gets overshadowed by bigger issues and never appears in bug lists – and even though it has annoyed me for years, no, I have never sent a bug report to Apple either.

Second it is only a problem when you have sufficient files.  This means it is unlikely to be encountered during normal user testing.  There are a class of problems like this and ‘expert slips’1, that require very long term use before they become apparent.  Rigorous user testing is not sufficient to produse usable systems. To be fair many people have a relatively small number of files and folders (often just one enormous “My Documents” folder!), but at a time when PCs ship with hundreds of giga-bytes of disk it does seem slighty odd that so much software fails either in terms of user interface (as in this case) or in terms of functionality (Spotlight is seriously challenged by my disk) when you actually use the space!

Finally, and I think the real reason, is in the implementation architecture.  For all sorts of good software engineering reasons, the functional separation between applications is very strong.  Typically the only way they ‘talk’ is through cut-and-paste or drag-and-drop, with occasional scripting for real experts. In most windowing environments the ‘application’ that lets you navigate files (Finder on the Mac, File Explorer in Windows) is just another application like all the rest.  From a system point of view, the file selection dialogue is part of the lower level toolkit and has no link to the particular application called ‘Finder’.  However, to me as a user, the Finder is special; it appears to me (and I am sure most) as ‘the computer’ and certainly part of the ‘desktop’.  Implementation architecture has a major interface effect.

But even if the Finder is ‘just another application’, the same holds for all applications.  As a user I see them all and if I have selected a font in one application why is it not easier to select the same font in another?  In the semantic web world there is an increasing move towards open data / linked data / web of data2, all about moving data out of application silos.  However, this usually refers to persistent data more like the file system of the PC … which actually is shared, at least physically, between applications; what is also needed is that some of the ephemeral state of interaction is also shared on a moment-to-moment basis.

Maybe this will emerge anyway with increasing numbers of micro-applications such as widgets … although if anything they often sit in silos as much as larger applications, just smaller silos.  In fact, I think the opposite is true, micro-applications and desktop mash-ups require us to understand better and develop just these ways to allow applications to ‘open up’, so that they can see what the user sees.

  1. see “Causing Trouble with Buttons” for how Steve Brewster and I once forced infrequent expert slips to happen often enough to be user testable[back]
  2. For example the Web of Data Practitioners Days I blogged about a couple of months back and the core vision of Talis Platform that I’m on the advisory board of.[back]

Dublin, Guiness and the future of HCI

I wrote the title of this post on 5th December just after I got back from Dublin.  I had been in Dublin for the SIGCHI Ireland Inaugural LectureHuman–Computer Interaction in the early 21st century: a stable discipline, a nascent science, and the growth of the long tail” and was going to write a bit about it (including my first flight on and off Tiree) … then thought I’d write a short synopsis of the talk … so parked this post until the synopsis was written.

One month and 8000 words later – the ‘synopsis’ sort of grew … but just finished and now it is on the web as either HTML version or PDF. Basically a sort of ‘state of the nation’ about the current state and challenges for HCI as a discipline …

And although it now fades a little I had great time in Dublin meeting, talking research, good company, good food … and yes … the odd pint of Guiness too.

Coast to coast: St Andrews to Tiree

A week ago I was in St Andrews on the east coast of Scotland delivering three lectures on “Human Computer Interaction: as it was, as it is and as it may be” as part of their distinguished lecture series and now I am in Tiree in the wild western ocean off the west coast.

I had a great time in St Andrews and was well looked after by some I knew already Ian, Gordan, John and Russell, and also met many new people. Ate good food and stayed in a lovely hotel overlooking the sea (and golf course) and full of pictures of golfers (well what do you expect in St Andrews).

For the lectures, I was told the general pattern was one lecture about the general academic area, one ‘state of the art’ and one about my own stuff … hence the three parts of the title!  Ever for cutesy titles I then called the individual lectures “Whose Computer Is It Anyway”, “The Great Escape” and “Connected, but Under Control, Big, but Brainy?”.

The first lecture was about the fact that computers are always ultimately for people (surprise surprise!) and I used Ian’s slight car accident on the evening before the lecture as a running example (sorry Ian).

The second lecture was about the way computers have escaped the office desktop and found their way into the physical world of ubiquitous computing, the digital world of the web ad into our everyday lives in out homes and increasingly the hub of our social lives too.  Matt Oppenheim did some great cartoons for this and I’m going to use them again in a few weeks when I visit Dublin to do the inaugural lecture for SIGCHI Ireland.

for 20 years the computer is chained to the office desktop (image © Matt Oppenheim)

(© Matt Oppenheim)

... now escapes: out into the world, spreading across the net, in the home, in our social lives (image © Matt Oppenheim)

(© Matt Oppenheim)

The last lecture was about intelligent internet stuff, similar to the lecture I gave at Aveiro a couple of weeks back … mentioning again the fact that the web now has the same information storage and processing capacity as a human brain1 … always makes people think … well at least it always makes ME think about what it means to be human.

… and now … in Tiree … sun, wild wind, horizontal hail, and paddling in the (rather chilly) sea at dawn

  1. see the brain and the web[back]

From raw experience to personal reflection

Just a week to go for deadline for this workshop on the Designing for Reflection on Experience that Corina and I are organising at CHI. Much of the time discussions of user experience are focused on trivia and even social networking often appears to stop at superficial levels.  While throwing a virtual banana at a friend may serve to maintain relationships and is perhaps less trivial than it at first appears; still there is little support for deeper reflection on life, with the possible exception of the many topic-focused chat groups.  However, in researching social networks we have found, amongst the flotsam, clear moments of poinency and conflict, traces of major life events … even divorce by Facebook. Too much navel gazing would not be a good thing, but some attention to expressing  deeper issues to others and to ourselves seems overdue.

PPIG2008 and the twenty first century coder

Last week I was giving a keynote at the annual workshop PPIG2008 of the Psychology of Programming Interest Group.   Before I went I was politely pronouncing this pee-pee-eye-gee … however, when I got there I found the accepted pronunciation was pee-pig … hence the logo!

My own keynote at PPIG2008 was “as we may code: the art (and craft) of computer programming in the 21st century” and was an exploration of the changes in coding from 1968 when Knuth published the first of his books on “the art of computer programming“.  On the web site for the talk I’ve made a relatively unstructured list of some of the distinctions I’ve noticed between 20th and 21st Century coding (C20 vs. C21); and in my slides I have started to add some more structure.  In general we have a move from more mathematical, analytic, problem solving approach, to something more akin to a search task, finding the right bits to fit together with a greater need for information management and social skills. Both this characterisation and the list are, of course, a gross simplification, but seem to capture some of the change of spirit.  These changes suggest different cognitive issues to be explored and maybe different personality types involved – as one of the attendees, David Greathead, pointed out, rather like the judging vs. perceiving personality distinction in Myers-Briggs1.

One interesting comment on this was from Marian Petre, who has studied many professional programmers.  Her impression, and echoed by others, was that the heavy-hitters were the more experienced programmers who had adapted to newer styles of programming, whereas  the younger programmers found it harder to adapt the other way when they hit difficult problems.  Another attendee suggested that perhaps I was focused more on application coding and that system coding and system programmers were still operating in the C20 mode.

The social nature of modern coding came out in several papers about agile methods and pair programming.  As well as being an important phenomena in its own right, pair programming gives a level of think-aloud  ‘for free’, so maybe this will also cast light on individual coding.

Margaret-Anne Storey gave a fascinating keynote about the use of comments and annotations in code and again this picks up the social nature of code as she was studying open-source coding where comments are often for other people in the community, maybe explaining actions, or suggesting improvements.  She reviewed a lot of material in the area and I was especially interested in one result that showed that novice programmers with small pieces of code found method comments more useful than class comments.  Given my own frequent complaint that code is inadequately documented at the class or higher level, this appeared to disagree with my own impressions.  However, in discussion it seemed that this was probably accounted for by differences in context: novice vs. expert programmers, small vs large code, internal comments vs. external documentation.  One of the big problems I find is that the way different classes work together to produce effects is particularly poorly documented.  Margaret-Anne described one system her group had worked on2 that allowed you to write a tour of your code opening windows, highlighting sections, etc.

I sadly missed some of the presentations as I had to go to other meetings (the danger of a conference at your home site!), but I did get to some and  was particularly fascinated by the more theoretical/philosophical session including one paper addressing the psychological origins of the notions of objects and another focused on (the dangers of) abstraction.

The latter, presented by Luke Church, critiqued  Jeanette Wing‘s 2006 CACM paper on Computational Thinking.  This is evidently a ‘big thing’ with loads of funding and hype … but one that I had entirely missed :-/ Basically the idea is to translate the ways that one thinks about computation to problems other than computers – nerds rule OK. The tenet’s of computational thinking seem to overlap a lot with management thinking and also reminded me of the way my own HCI community and also parts of the Design (with capital D) community in different ways are trying to say they we/they are the universal discipline  … well if we don’t say it about our own discipline who will …the physicists have been getting away with it for years 😉

Luke (and his co-authors) argument is that abstraction can be dangerous (although of course it is also powerful).  It would be interesting perhaps rather than Wing’s paper to look at this argument alongside  Jeff Kramer’s 2007 CACM article “Is abstraction the key to computing?“, which I recall liking because it says computer scientists ought to know more mathematics 🙂 🙂

I also sadly missed some of Adrian Mackenzie‘s closing keynote … although this time not due to competing meetings but because I had been up since 4:30am reading a PhD thesis and after lunch on a Friday had begin to flag!  However, this was no reflection an Adrian’s talk and the bits I heard were fascinating looking at the way bio-tech is using the language of software engineering.  This sparked a debate relating back to the overuse of abstraction, especially in the case of the genome where interactions between parts are strong and so the software component analogy weak.  It also reminded me of yet another relatively recent paper3 on the way computation can be seen in many phenomena and should not be construed solely as a science of computers.

As well as the academic content it was great to be with the PPIG crowd they are a small but very welcoming and accepting community – I don’t recall anything but constructive and friendly debate … and next year they have PPIG09 in Limerick – PPIG and Guiness what could be better!

  1. David has done some really interesting work on the relationship between personality types and different kinds of programming tasks.  I’ve seen him present before about debugging and unfortunately had to miss his talk at PPIG on comprehension.  Given his work has has shown clearly that there are strong correlations between certain personality attributes and coding, it would be good to see more qualitative work investigating the nature of the differences.   I’d like to know whether strategies change between personality types: for example, between systematic debugging and more insight-based scan and see it bug finding. [back]
  2. but I can’t find on their website :-([back]
  3. Perhaps 2006/2007 in either CACM or Computer Journal, if anyone knows the one I mean please remind me![back]