Six weeks on the road

I’ve been at home for the last week after six weeks travelling around the UK and elsewhere.  I’ve not kept up while on the road so doing a retrospective post on it all and need to try to catch on other half written posts.

As well as time at Talis offices in B’ham and at Lancs (including exam board week), travels have taken me to Pisa for a workshop on ‘Supportive User Interfaces’, to Koblenz for Web Science conference giving a talk on embodiment issues and a poster on web-scale reasoning , to Newcastle for British HCI conference doing a talk on fridge, to Nottingham to give a talk on extended episodic experience, and back to Lancs for a session on creativity! Why can’t I be like sensible folks and talk on one topic!

Supportive User Interfaces

Monday 13th June I attended a workshop in Pisa on “Supportive User Interfaces“, which includes interfaces that adapt in various ways to users.  The majority of people there were involved in various forms of model-based user interfaces in which various models of the task, application and interaction are used to generate user interfaces on the fly. W3C have had a previous group in this area; Dave Raggett from w3c was at the workshop and it sounds like there will be a new working group soon.  This clearly has strong links to various forms of ‘meta-level’ representations of data, tasks, etc..  My own contribution started the day, framing the area, focusing partly on reasons for having more ‘meta-level’ interfaces including social empowerment, and partly on the principles/techniques that need to be considered at a human level.

Also on Monday was a meeting of IFIP Working Group 2.7/13.4. IFIP is the UNESCO founded pan-national agency that national computer societies such as as the BCS in the UK and ACM and IEEE Computer in the US belong to.  Working Group 2.7/13.4 is focused on the engineering of user interfaces.  I had been actively involved in the past, but have had many years’ lapse.  However, this seemed a good thing to re-engage with with my new Talis hat on!

SUI: paper:

Web Science Conference in Koblenz

Jaime Teevan from Microsoft gave the opening keynote at WebSci 2011.  I know her from her earlier work on personal information management, but her recent work and keynote was about work on analysing and visualising changes in web pages.  Web page changes are also analysed alongside users re-visitation patterns; by looking at the frequency of re-visitation Jaime and her colleagues are able to identify the parts of pages that change with similar frequency, helping them, inter alia, to improve search ranking.

Had many great conversations, some with people I know previously (e.g. the Southampton folks), but also new, including the group at Troy that do lots of work with data.gov.  I was particularly interested in some work using content matching to look for links between otherwise unlinked (or only partly inter-linked) datasets.  Also lots of good presentations including one on trust prediction and a fantastic talk by Mark Bernstein from Eastgate, which he delivered in blank verse!

My own contribution included the poster that Dave@Talis prepared, which was on the web-scale spreading activation work in collaboration with Univ. Athens.  Quite a niche area in a multi-disciplinary conference, so didn’t elicit quite the interest of the social networking posters, but did lead to a small number of in depth discussions.

In addition I gave talk on the more cognitive/philosophical issues when we start to use the web as an external extension to / replacement of memory, including its impact on education.  Got some good feedback from this.

Closing keynote was from Barry Wellman, the guy who started social network analysis way before they were on computers.  At one point he challenged the Dunbar number1. I wondered whether this was due to cognitive extension with address books etc., but he didn’t seem to think so; there is evidence that some large circles predate web (although maybe not physical address books).  Made me wonder about itinerant tradesmen, tinkers, etc., even with no prostheses. Maybe the numbers sort of apply to any single content, but are repeated for each new context?

WebSci papers:

The HCI Conference – Newcastle

I attended the British HCI conference in Newcastle. This was the 25th conference, and as my very first academic paper in computing2 was at the first BHCI in 1984, I was pleased to be there at this anniversary.  The paper I was presenting was a retrospective on vfridge, a social networking site dating back to 1999/2000, it seemed an historic occasion!

As is always the case presentations were all interesting. Strictly BHCI is a ‘second tier’ conference compared with CHI, but why is it that the papers are always more interesting, that I learn more?  It is likely that a fair number of papers were CHI rejects, so it should be the other way round – is it that selectivity and ‘quality’ inevitably become conservative and boring?

Gregory Abowd gave the closing keynote. It was great to see Gregory again, we meet too rarely.  The main focus of his keynote was on three aspects of research: novelty, value and reliability and how his own work had moved within this space over the years.  In particular having two autistic sons has led him in directions he would never have considered, and this immediately valuable work has also created highly novel research. Novelty and value can coexist.

Gregory also reflected on the BHCI conference as it was his early academic ‘home’ when he did his PhD and postdoctoral here in the late 1980’s.  He thought that it could be rather than, as with many conferences, a second best to getting a CHI paper, instead a place for (not getting the quote quite perfect) “papers that should get into CHI”, by which he meant a proving ground for new ideas that would then go on to be in CHI.

Alan at conference dinnerHowever I initially read the quote differently. BHCI always had a broader concept of HCI compare with CHI’s quite limited scope. That is BHCI as a place that points the way for the future of HCI, just as it was the early nurturing place of MobileHCI.  However CHI has now become much broader in it’s own conception, so maybe this is no longer necessary. Indeed at the althci session the organisers said that their only complaint was that the papers were not ‘alt’ enough – that maybe ‘alt’ had become mainstream. This prompted Russell Beale to suggest that maybe althci should now be real science such as replication!

Gregory also noted the power of the conference as a meeting ground. It has always been proud of the breadth of international attendance, but perhaps it is UK saturation that should be it’s real measure of success.  Of course the conference agenda has become so full and international travel so much cheaper than it was, so there is a tendency to  go to the more topic specific international conferences and neglect the UK scene.  This is compounded by the relative dearth of small UK day workshops that used to be so useful in nurturing new researchers.

Tom at conference dinnerI feel a little guilty here as this was the first BHCI I had been to since it was in Lancaster in 2007 … as Tom McEwan pointed out I always apologise but never come! However, to be fair I have also only been twice to CHI in the last 10 years, and then when it was in Vienna and Florence. I have just felt too busy, so avoiding conferences that I did not absolutely have to attend.

In response to Gregory’s comments, someone, maybe Tom, mentioned that in days of metrics-based research assessment there was a tendency to submit one’s best work to those venues likely to achieve highest impact, hence the draw of CHI. However, I have hardly ever published in CHI and I think only once in TOCHI, yet, according to Microsoft Research, I am currently the most highly cited HCI researcher over the last 5 years … So you don’t have to publish in CHI to get impact!

And incidentally, the vfridge paper had NOT been submitted to CHI, but was specially written for BHCI as it seemed the fitting place to discuss a thoroughly British product 🙂

vfridge paper:

Nottingham MRL

I was at Mixed Reality Lab in Nottingham for Joel Fischer‘s PhD viva and while there did a seminar the afternoon on “extended episodic experience” based on Haliyana Khalid‘s PhD work and ideas that arose from it. Basically, whereas ‘user experience’ has become a big issue most of the work is focused on individual ‘experiences’ whereas much of life consists of ongoing series of experiences (episodes) which together make up the whole experience of interacting with a person or place, following a band, etc.

I had obviously not done a good enough job at wearing Joel down with difficult questions in the PhD viva in the morning as he was there in the afternoon to ask difficult questions back of his own 😉

Docfest – Digital Economy Summer School

The last major event was Docfest, which brought together the PhD students from the digital economy centres from around the country. Not sure of the exact count but just short of 150 participants I think. They come from a wide variety of backgrounds, business, design, computing, engineering, and many are mature students with years of professional experience behind them.

This looked like being a super event, unfortunately I was only able to attend for a day 🙁  However, I had a great evening at the welcome event talking with many of the students and even got to ride in Steve Forshaw‘s Sinclair C5!

My contribution to the event was running the first morning session on ‘creativity’. Surprise, surprise this started with a bad ideas session, but new for me too as the largest group I’ve run in the past has been around 30.  There were a number of local Highwire students acting as facilitators for the groups, so I had only to set them off and observe results :-). At the end of the morning I gave some the theoretical background to bad ideas as a method and in understanding (aspects of) creativity more widely.

Other speakers at the event included Jane Prophet, Chris Csikszentmihalyi and Chris Bonnington, so was sad to miss them; although I did get a fascinating chat with Jane over breakfast in the hotel hearing about her new projects on arts and neural imaging, and on how repetitious writing induces temporary psychosis … That is why the teachers give lines, to send the pupils bonkers!

  1. The idea that there are fundamental cognitive limits on social groups with different sized circles family~6, extended family~20, village~60, large village~200[back]
  2. I had published previously in agricultural engineering.[back]

Are five users enough?

[An edited version of this post is reproduced at HCIbook online!]

I recently got a Facebook message from a researcher about to do a study who asked, “Do you think 5 (users) is enough?”

Along with Fitts’ Law and Miller’s 7+/-2 this “five is enough” must be among the most widely cited, and most widely misunderstood and misused aphorisms of HCI and usability. Indeed, I feel that this post belongs more in ‘Myth Busters” than in my blog.

So, do I think five is enough? Of course, the answer is (annoyingly), “it depends”, sometimes, yes, five is enough, but sometimes, fewer: one appropriate user, or maybe no users at all, and sometimes you need more, maybe many more: 20, 100, 1000. But even when five is enough, the reasons are usually not directly related to Nielsen and Landauer’s original paper, which people often cite (although rarely read) and Nielsen’s “Why You Only Need to Test With 5 Users” alert box (probably more often actually read … well at least the title).

The “it depends” is partly dependent on the particular circumstances of the situation (types of people involved, kind of phenomenon, etc.), and partly on the kind of question you want to ask. The latter is the most critical issue, as if you don’t know what you want to know, how can you know how many users you need?

There are several sorts of reasons for doing some sort of user study/experiment, several of which may apply:

1. To improve a user interface (formative evaluation)

2. To assess whether it is good enough (summative evaluation)

3. To answer some quantitative question such as “what % of users are able to successfully complete this task”

4. To verify or refute some hypothesis such as “does eye gaze follow Fitts’ Law”

5. To perform a broad qualitative investigation of an area

6. To explore some domain or the use of a product in order to gain insight

It is common to see HCI researchers very confused about these distinctions, and effectively perform formative/summative evaluation in research papers (1 or 2) where in fact one of 3-6 is what is really needed.

I’ll look at each in turn, but first to note that, to the extent there is empirical evidence for ‘five is enough”, it applies to the first of these only.

I dealt with this briefly in my paper “Human–Computer Interaction: a stable discipline, a nascent science, and the growth of the long tail” in the John Long Festschrift edition of IwC, and quote here:

In the case of the figure of five users, this was developed based on a combination of a mathematical model and empirical results (Nielsen and Landauer 1993). The figure of five users is:

(i) about the optimal cost/benefit point within an iterative development cycle, considerably more users are required for summative evaluation or where there is only the opportunity for a single formative evaluation stage;

(ii) an average over a number of projects and needs to be assessed on a system by system basis; and

(iii) based on a number of assumptions, in particular, independence of faults, that are more reasonable for more fully developed systems than for early prototypes, where one fault may mask another.

We’ll look at this in more detail below, but critically, the number ‘5’ is not a panacea, even for formative evaluation.

As important as the kind of question you are asking, are the kind of users you are using. So much of modern psychology is effectively the psychology of first year psychology undergraduates (in the in 1950s it was male prisoners). Is this representative? Does this matter? I’ll return to this at the end, but first of all look briefly at each kind of question.

Finally, there is perhaps the real question “will the reviewers of my work think five users is enough” — good publications vs. good science. The answer is that they will certainly be as influenced by the Myth of Five Users as you are, so do good science … but be prepared to need to educate your reviewers too!

formative evaluation – prototyping cycle

As noted formative evaluation was the scope of Nielsen and Landauer’s early work in 1993 that was then cited by Nielsen in his Alert Box in 2000, and which has now developed mythic status in the field.

The 1993 paper was assuming a context of iterative development where there would be many iterations, and asking how many users should be used per iteration, that is how many users should you test before fixing the problems found by those users, and then performing another cycle of user testing, and another. That is, in all cases they considered, the total number of users involved would be far more than five, it is just the number used in each iteration that was lower.

In order to calculate the optimal number of subjects to use per iteration, they looked at:

(i) the cost of performing a user evaluation

(ii) the number of new faults found (additional users will see many of the same faults, so there are diminishing returns)

(iii) the cost of a redevelopment cycle

All of these differ depending on the kind of project, so Nielsen and Landauer looked at a range of projects of differing levels of complexity. By putting them together, and combining with simple probabilistic models of bug finding in software, you can calculate an optimal number of users per experiment.

They found that, depending on the project, the statistics and costs varied and hence the optimal number of users/evaluators (between 7 and 21), with, on the whole, more complex projects (with more different kinds of bugs and more costly redevelopment cycles) having a higher optimal number than simpler projects. In fact all the numbers are larger than five, but five was the number in Nielsen’s earlier discount engineering paper, so the paper did some additional calculations that yielded a different kind of (lower) optimum (3.2 users — pity the last 0.2 user), with five somewhere between 7 and 3 … and a myth was born!

Today, with Web 2.0 and ‘perpetual beta’, agile methods and continuous deployment reduce redevelopment costs to near zero, and so Twidale and Marty argue for ‘extreme evaluation‘ where often one user may be enough (see also my IwC  paper).

The number also varies through the development process; early on, one user (indeed using it yourself) will find many, many faults that need to be fixed. Later faults become more obscure, or maybe only surface after long-term use.

Of course, if you use some sort of expert or heuristic evaluation, then the answer may be no real users at all!

And anyway all of this is about ‘fault finding’, usability is not about bug fixing but making things better, it is not at all clear how useful, if at all, literature on bug fixing is for creating positive experiences.

summative evaluation – is it good enough to release

If you are faced with a product and want to ask “is it good enough?” (which can mean, “are there any usability ‘faults’?”, or, “do people want to use it?”), then five users is almost certainly not enough. To give yourself any confidence of coverage of the kinds of users and kinds of use situations, you may need tens or hundreds of users, very like hypothesis testing (below).

However, the answer here may also be zero users. If the end product is the result of a continuous evaluation process with one, five or some other number of users per iteration, then the number of users who have seen the product during this process may be sufficient, especially if you are effectively iterating towards a steady state where few or no new faults are found per iteration.

In fact, even when there has been a continuous process, the need for long-term evaluation becomes more critical as the development progresses, and maybe the distinction between summative and late-stage formative is moot.

But in the end there is only one user you need to satisfy — the CEO … ask Apple.

quantitative questions and hypothesis testing

(Note, there are real numbers here, but if you are a numerophobe never fear, the next part will go back to qualitative issues, so bear with it!)

Most researchers know that “five is enough” does not apply in experimental or quantitative studies … but that doesn’t always stop them quoting numbers back!

Happily in dealing with more quantitative questions or precise yes/no ones, we can look to some fairly solid statistical rules for the appropriate number of users for assessing different kinds of effects (but do note “the right kind of users” below). And yes, very, very occasionally five may be enough!

Let’s imagine that our hypothesis is that a behaviour will occur in 50% of users doing an experiment. With five users, the probability that we will see this behaviour in at least one user is 1 in 32, which is approximately 3%. That is if we do not observe the behaviour at all, then we have a statistically significant result at 5% level (p<0.05) and can reject the hypothesis.

Note that there is a crucial difference between a phenomenon that we expect to see in about 50% of user iterations (i.e. the same user will do it about 50% of the time) and one where we expect 50% of people to do it all of the time. The former we can deal with using a small number of users and maybe longer or repeated experiments, the latter needs more users.

If instead, we wanted to show that a behaviour happens less than 25% of the time, then we need at least 11 users, for 10% 29 users. On the other hand, if we hypothesised that a behaviour happens 90% of the time and didn’t see it in just two users we can reject the hypothesis at significance level of 1%. In the extreme if our hypothesis is that something never happens and we see it with just one user, or if the hypothesis is that it always happens and we fail to see it with one user, in both cases we can reject our hypothesis.

The above only pertains when you see samples where either none or all of the users do something. More often we are trying to assess some number. Rather than “does this behaviour occur 50% of the time”, we are asking “how often does this behaviour occur”.

Imagine we have 100 users (a lot more than five!), and notice that 60% do one thing and 40% do the opposite. Can we conclude that in general the first thing is more prevalent? The answer is yes, but only just. Where something is a simple yes/no or either/or choice and we have counted the replies, we have a binomial distribution. If we have n (100) users and the probability of them answering ‘yes’ is p (50% if there is no real preference), then the maths says that the average number of times we expect to see a ‘yes’ response is n x p = 100 x 0.5 = 50 people — fairly obvious. It also says that the standard deviation of this count is sqrt(n x p x (1-p ) ) = sqrt(25) = 5. As a rule of thumb if answers differ by more than 2 standard deviations from the expected value, then this is statistically significant; so 60 ‘yes’ answers vs. the expected 50 is significant at 5%, but 55 would have just been ‘in the noise’.

Now drop this down to 10 users and imagine you have 7 ‘yes’s and 3 ‘no’s. For these users, in this experiment, they answer ‘yes’ more than twice as often as ‘no’, but here this difference is still no more than we might expect by chance. You need at least 8 to 2 before you can say anything more. For five users even 4 to 1 is quite normal (try tossing five coins and see how many come up heads); only if all or none do something can you start to think you are onto something!

For more complex kinds of questions such as “how fast”, rather than “how often”, the statistics becomes a little more complex, and typically more users are needed to gain any level of confidence.

As a rule of thumb some psychologists talk of 20 users per condition, so if you are comparing 4 things then you need 80 users. However,  this is just a rule of thumb and some phenomena have very high variability (e.g. problem solving) whereas others (such as low-level motor actions) are more repeatable for an individual and have more consistency between users. For phenomena with very high variability even 20 users per condition may be too few, although within subjects designs may help if possible. Pilot experiments or previous experiments concerning the same phenomenon are important, but this is probably the time to consult a statistician who can assess the statistical ‘power’ of a suggested design (the likelihood that it will reveal the issue of interest).

qualitative study

Here none of the above applies and … so … well … hmm how do you decide how many users? Often people rely on ‘professional judgement’, which is a posh way of saying “finger in the air”.

In fact, some of the numerical arguments above do still apply (sorry numerophobes). If as part of your qualitative study you are interested in a behaviour that you believe happens about half the time, then with five users you would be very unlucky not to observe it (3% of the time). Or put it another way, if you observe five users you will see around 97% of behaviours that at least half of all users have (with loads and loads of assumptions!).

If you are interested in rarer phenomena, then you need either lots more users (for behaviour that you only see in 1 in 10 users, then you have only a 40% chance of observing it with 5 users, and perhaps more surprisingly, only 65% chance of seeing it with 10 users).

However, if you are interested in a particular phenomenon, then randomly choosing people is not the way to go anyway, you are obviously going to select people who you feel are most likely to exhibit it; the aim is not to assess its prevalence in the world, but to find a few and see what happens.

Crucially when you generalise from qualitative results you do it differently.

Now in fact you will see many qualitative papers that add caveats to say “our results only apply to the group studied …”. This may be necessary to satisfy certain reviewers, but is at best disingenuous – if you really believe the results of your qualitative work do not generalise at all, then why are you publishing it – telling me things that I cannot use?

In fact, we do generalise from qualitative work, with care, noting the particular limitations of the groups studied, but still assume that the results are of use beyond the five, ten or one hundred people that we observed. However, we do not generalise through statistics, or from the raw data, but through reasoning that certain phenomena, even if only observed once, are likely to be ones that will be seen again, even if differing in details. We always generalise from our heads, not from data.

Whether it is one, five or more, by its nature deep qualitative data will involve fewer users than more shallow methods such as large scale experiments or surveys. I often find that the value of this kind of deep interpretative data is enhanced by seeing it alongside large-scale shallow data. For example, if survey or log data reveals that 70% of users have a particular problem and you observe two users having the same problem, then it is not unreasonable to assume that the reasons for the problem are similar to those of the large sample — yes you can generalise from two!

Indeed one user may be sufficient (as often happens with medical case histories, or business case studies), but often it is about getting enough users so that interesting things turn up.

exploratory study

This looking for interesting things is often the purpose of research: finding a problem to tackle. Once we have found an interesting issue, we may address it in many ways: formal experiments, design solutions, qualitative studies; but none of these are possible without something interesting to look at.

In such situations, as we saw with qualitative studies in general, the sole criteria for “is N enough” is whether you have learnt something.

If you want to see all, or most of the common phenomena, then you need lots of users. However, if you just want to find one interesting one, then you only need as many as gets you there. Furthermore whilst you often choose ‘representative or ‘typical’ users (what is a typical user!) for most kinds of study and evaluation, for exploratory analysis, often extreme users are most insightful; of course you have to work out whether your user or users are so unusual that the things you observe are unique to them … but again real research comes from the head, you have to think about it and make an assessment.

In the IwC paper I discuss some of the issues of single person studies in more detail and Fariza Razak’s thesis is all about this.

the right kind of users

If you have five, fifty or five hundred users, but they are all psychology undergraduates, they are not going to tell you much about usage by elderly users, or by young unemployed people who have left school without qualifications.

Again the results of research ultimately come from the head not the data: you will never get a complete typical, or representative sample of users; the crucial thing is to understand the nature of the users you are studying, and to make an assessment of whether the effects you see in them are relevant, and interesting more widely. If you are measuring reaction times, then education may not be a significant factor, but Game Boy use may be.

Many years ago I was approached by a political science PhD student. He had survey data from over 200 people (not just five!), and wanted to know how to calculate error bars to go on his graphs. This was easily done and I explained the procedure (a more systematic version of the short account given earlier). However, I was more interested in the selection of those 200 people. They were Members of Parliament; he had sent the survey to every MP (all 650 of them) and got over 200 replies, a 30% return rate, which is excellent for any survey. However, this was a self-selected group and so I was more interested in whether the grounds for self-selection influenced the answers than in how many of them there were. It is often the case that those with strong views on a topic are more likely to answer surveys on it. The procedure he had used was as good as possible, but, in order to be able to make any sort of statement about the interpretation of the data, he needed to make a judgement. Yet again knowledge is ultimately from the head not the data.

For small numbers of users these choices are far more critical. Do you try and choose a number of similar people, so you can contrast them, or very different so that you get a spread? There is no right answer, but if you imagine having done the study and interpreting the results this can often help you to see the best choice for your circumstances.

being practical

In reality whether choosing how many, or who, to study, we are driven by availability. It is nice to imagine that we make objective selections based on some optimal criteria — but life is not like that. In reality, the number and kind of users we study is determined by the number and kind of users we can recruit. The key thing is to understand the implications of these ‘choices’ and use these in your interpretation.

As a reviewer I would prefer honesty here, to know how and why users were selected so that I can assess the impact of this on the results. But that is a counsel of perfection, and again good science and getting published are not the same thing! Happily there are lovely euphemisms such as ‘convenience sample’ (who I could find) and ‘snowball sample’ (friends of friends, and friends of friends of friends), which allow honesty without insulting reviewers’ academic sensibilities.

in the end

Is five users enough? It depends: one, five, fifty or one thousand (Google test live with millions!). Think about what you want out of the study: numbers, ideas, faults to fix, and the confidence and coverage of issues you are interested in, and let that determine the number.

And, if I’ve not said it enough already, in the end good research comes from your head, from thinking and understanding the users, the questions you are posing, not from the fact that you had five users.

references

A. Dix (2010)  Human-Computer Interaction: a stable discipline, a nascent science, and the growth of the long tail. Interacting with Computers, 22(1) pp. 13-27. http://www.hcibook.com/alan/papers/IwC-LongFsch-HCI-2010/

Nielsen, J. (1989). Usability engineering at a discount. In Salvendy, G., and Smith, M.J. (Eds.), Designing and Using Human–Computer Interfaces and Knowledge Based Systems, Elsevier Science Publishers, Amsterdam. 394-401.

Nielsen, J. and Landauer, T. K. 1993. A mathematical model of the finding of usability problems. In Proceedings of the INTERACT ’93 and CHI ’93 Conference on Human Factors in Computing Systems (Amsterdam, The Netherlands, April 24 ? 29, 1993). CHI ’93. ACM, New York, NY, 206?213. http://doi.acm.org/10.1145/169059.169166

Jakob Nielsen’s Alertbox, March 19, 2000: Why You Only Need to Test With 5 Users. http://www.useit.com/alertbox/20000319.html

Fariza Razak (2008). Single Person Study: Methodological Issues. PhD Thesis. Computing Department, Lancaster University, UK. February 2008. http://www.hcibook.net/people/Fariza/

Michael Twidale and Paul Marty (2004-) Extreme Evaluation.  http://people.lis.uiuc.edu/~twidale/research/xe/

Or … is Amazon becoming the publishing Industry?

A recent Blog Kindle post asked “Is Amazon’s Kindle Destroying the Publishing Industry?“.  The post defends Kindle seeing the traditional publishers as reactionaries, whose business model depended on paper publishing and, effectively. keeping authors from their public.

However, as an author myself (albeit academic) this seems to completely miss the reasons for the publishing industry.  The printing of physical volumes has long been a minimal part of the value, indeed traditional publishers have made good use of the changes in physical print industry to outsource actual production.  The core value for the author are the things around this: marketing, distribution and payment management.

Of these, distribution is of course much easier now with the web, whether delivering electronic copies, or physical copies via print-on demand services.  However, the other core values persist – at their best publishers do not ring fence the public from the author, but on the contrary connect the two.

I recall as a child being in the Puffin Club and receiving the monthly magazine.  I could not afford many books at the time, but since have read many of the books described in its pages and recall the excitement of reading those reviews.  A friend has a collection of the early Puffins (1-200) in their original covers; although some stories age, some are better, some worse, still just being a Puffin Book was a pretty good indication it was worth reading.

The myth we are being peddled is of a dis-intermediated networked world where customers connect directly to suppliers, authors to readers1, musicians to fans.  For me, this has some truth, I am well enough known and well enough connected to distribute effectively.  However for most that ‘direct’ connection is mediated by one of a small number of global sites … and smaller number of companies: YouTube, Twitter, Google, iTunes, eBay, not to forget Amazon.

For publishing as in other areas, what matters is not physical production, the paper, but the route, the connection, the channel.

And crucially Kindle is not just the device, but the channel.

The issue is not whether Kindle kills the publishing industry, but whether Amazon becomes the publishing industry.  Furthermore, if Amazon’s standard markdown and distribution deals for small publishers are anything to go by, Amazon is hardly going to be a cuddly home for future authors.

To some extent this is an apparently inexorable path that has happened in the traditional industries, with a few large publishing conglomerates buying up the smaller publishing houses, and on the high street a few large bookstore chains such as Waterstones, Barnes & Noble squeezing out the small bookshops (remember “You’ve Got Mail“), and it is hard to have sympathy with Waterstones recent financial problems given this history.

Philip Jones of the Bookseller recently blogged about these changes, noting that it is in fact book selling, not publishing that is struggling with profits … even Amazon – no wonder Amazon want more of the publishing action.  However, while Jones notes that the “digital will lead to smaller book chains, stocking fewer titles” in fact “It wasn’t digital that drove this, but it is about to deliver the coup de grâce.”

Which does seem a depressing vision both as author and reader.

  1. Maybe unbound.co.uk is actually doing this – see Guardian article, although it sounds more useful to the already successful writer than the new author.[back]

Mathematics, Jewishness, and Direction

When I was nearly 18 I was part of the British team to the International Mathematical Olympiad (IMO) in Bucharest (see my account of the experience).  The US team were Jewish1, all eight of them.  While this was noteworthy, it was not surprising. There does seem to be a remarkable number of high achieving Jewish mathematicians, including nearly a quarter of Fields Medal recipients (the Maths equivalent of the Nobel Prize) and half of the mathematics members of the US National Academy of Sciences2.

Is this culture or genes, nature or nurture?

As with most things, I’d guess the answer is a mix.  But, if of culture, what? There is a tradition of Biblical numerology, but hardly widespread enough to make the substantial effects. Is it to do with the discipline of learning Hebrew, maybe just discipline, or perhaps is it that mathematics is one of the fields where there has been less prejudice in academic appointments3.

I have just read a paper, “Disembodying Cognition” by Anjan Chatterjee, that may shed a little light on this.  The paper is an excellent overview of current neuroscience research  on embodiment and also its limits (hence ‘disembodying’).  One positive embodiment result related to representations of actions, such as someone kicking a ball, which are often depicted with the agent on the left and the acted upon object on the right.  However, when these experiments are repeated for Arab participants, the direction effects are reversed (p.102).  Chaterjee surmises that this is due to the right-to-left reading direction in Arabic.

In mathematics an equation is strictly symmetrical, simply stating that two thinsg are equal.  However, we typically see equations such as:

y = 3x + 7

where the declarative reading may well be:

y is the same as “3x + 7”

but the more procedural ‘arithmatic’ reading is:

take x, multiple by three, add seven and this gives y

In programming languages this is of course the normal semantics … and can give rise to confusion in statements such as:

x = x + 1

This is both confusing if read as an equation (why some programming languages have := read as “becomes equal to”), but also conflicts with the left-to-right reading of English and European languages.

COBOL which was designed for business use, used English-like syntax, which did read left to right:

ADD Tiree-Total TO Coll-Total GIVING Overall-Total.

Returning to Jewish mathematicians, does the right-to-left reading of Hebrew help in early understanding of algebra?  But if so then surely there should be many more contemporary Arab mathematicians also.  This is clearly not the full story, but maybe it is one contributory factor.

And, at the risk of confusing all of us brought up with the ‘conventional’ way of writing equations, would it be easier for English-speaking children if they were introduced to the mathematically equivalent, but linguistically more comprehensible:

3x + 7 = y

  1. Although they did have to ‘forget’ while they were there otherwise they would have starved on the all-pork cuisine[back]
  2. Source jews.org “Jews in Mathematics“.[back]
  3. The Russians did not send a team to the IMO in 1978.  There were three explanations of this (i) because it was in Romania, (ii) because the Romanians had invited a Chinese team and (iii), because the Russian national mathematical Olympiad had also produced an all Jewish team and the major Moscow university that always admitted the team did not want that many Jewish students.  Whether the last explanation is true or not, it certainly is consonant with the levels of explicit discrimination in the USSR at the time. [back]

A month away brain engaged and blood on the floor

Writing at Glasgow airport waiting for flight home after nearly whole month away. I have had a really productive time first at Talis HQ and Lancs (all in the camper van!) and then visits to Southampton (experience design and semantic web), Athens (ontologies and brain-like computation) and Konstanz (visualisation and visual analytics).

Loads of intellectual stimulation, but now really looking forward to some time at home to consolidate a little.

During my time away I managed to fall downstairs, bleed profusely over the hotel floor, and break a tooth. My belonging didn’t fare any better: my glasses fell apart and my sandals and suitcase are now holding together by threads … So maybe safer at home for a bit!

Hierarchical grammars for more human-like compiler parsing

Nearly twenty years ago, back when I was in York, one of my student project suggestions was to try to make compiler parsers operate a little more like a human: scanning first for high-level structures like brackets and blocks and only moving on to finer level features later.  If I recall there were several reasons for this, including connections with ‘dynamic pointers’1, but most important to help error reporting, especially in cases of mismatched brackets or missing ‘;’ from line ends … still a big problem.

Looking back I can see that one MEng student considered it, but in the end didn’t do it, so it lay amongst that great pile of “things to do one day” and discuss occasionally over tea or beer. I recall too looking at grammar-to-grammar parsers … I guess now-a-days I might imagine using XSLT!

Today, 18 years on, while scanning David Unger’s publications I discover that he actually did this in the Java parser at Sun2.  I don’t know if this is actually used in the current Java implementations.  Their reasons for looking at the issue  were to do with making the parser easier to maintain, so it may actually be that this is being done under the hood, but the benefits for the Java programmer not being realised.

While I was originally thinking about programming languages, I have more recently found myself using the general methods in anger when doing data cleaning as often one approaches this in a pipeline fashion, creating elements of structure along the way that are picked up by future parsing/cleaning steps.

To my knowledge there are no general purpose tools for doing this.  So, if anyone is looking for a little project, here is my own original project suggestion from 1993 …

Background
When compilers parse a computer program, they usually proceed in a sequential, left-to-right fashion. The computational requirement of limited lookahead means that the syntax of programming languages must usually be close to LL(1) or LR(1). Human readers use a very different strategy. They scan the text for significant features, building up an understanding of the text in a more top down fashion. The human reader thus looks at the syntax at multiple levels and we can think of this as a hierarchical grammar.

Objective
The purpose of this project is to build a parser based more closely on this human parsing strategy. The target language could be Pascal or C (ADA is probably a little complex!). The parser will operate in two or more passes. The first pass would identify the block structure, for example, in C this would be based on matching various brackets and delimiters `{};,()’. This would yield a partially sequential, partially tree-like structure. Mismatched brackets could be detected at this stage, avoiding the normally confusing error messages generated by this common error. Subsequent passes would `parse’ this tree eventually obtaining a standard syntax tree.

Options
Depending on progress, the project can develop in various ways. One option is to use the more human-like parsing to improve error reporting, for example, the first pass could identify the likely sites for where brackets have been missed by analysing the indentation structure of the program. Another option would be to build a YACC-like tool to assist in the production of multi-level parsers.

Reading

1.  S. P. Robertson, E. F. Davis, K. Okabe and D. Fitz-Randolf, “Program comprehension beyond the line”3, pp. 959-963 in Proceedings of Interact’90, North-Holland, 1990.
2.  Recommended reading from compiler construction course
3.  YACC manual from UNIX manual set.

  1. For more on Dynamic Pointers see my first book “Formal Methods for Interactive Systems“, a CSCW journal paper “Dynamic pointers and threads“[back]
  2. Modular parser architecture with mini parsers. D M Ungar, US Patent 7,089,541, 2006[back]
  3. Incidentally, “Program comprehension beyond the line” is a fantastic paper both for its results and also methodologically. In the days when eye-tracking was still pretty complex (maybe still now!), they wanted to study program comprehension, so instead of following eye gaze, they forced experimental subjects to physically scroll through code using a  single-line browser.  [back]

the real tragedy of the commons

I’ve just been reviewing a paper that mentions the “tragedy of the commons”1  and whenever I read or hear the phrase I feel the hackles on the back of my neck rise.

Of course the real tragedy of the commons was not free-riding and depletion by common use, but the rape of the land under mass eviction or enclosure movements when they ceased to be commons.  The real tragedy of “the tragedy of the commons” as a catch phrase is that it is often used to promote the very same practices of centralisation.  Where common land has survived today, just as in the time before enclosures and clearances, it is still managed in a collaborative way both for the people now and the for the sake of future generations.  Indeed on Tiree, where I live, there are large tracts of common grazing land managed in just such a way.

It is good to see that the Wikipedia article of “Tragedy of the Commons” does give a rounded view on the topic including reference to an historical and political critique by “Ian Angus”2

The paper I was reading was not alone in uncritically using the phrase.  Indeed in “A Framework for Web Science”3 we read:

In a decentralised and growing Web, where there are no “owners” as such, can we be sure that decisions that make sense for an individual do not damage the interests of users as a whole? Such a situation, known as the ‘tragedy of the commons’, happens in many social systems that eschew property rights and centralised institutions once the number of users becomes too large to coordinate using peer pressure and moral principles.

In fact I do have some sympathy with this as the web involves a vast number of physically dispersed users who are perhaps “too large to coordinate using peer pressure and moral principles”.  However, what is strange is that the web has raised so many modern counter examples to the tragedy of the commons, not least Wikipedia itself.  In many open source projects people work as effectively a form of gift economy, where, if there is any reward, it is in the form of community or individual respect.

Clearly, there are examples in the world today where many individual decisions (often for short term gain) lead to larger scale collective loss.  This is most clearly evident in the environment, but also the recent banking crisis, which was fuelled by the desire for large mortgages and general debt-led lives.  However, these are exactly the opposite of the values surrounding traditional common goods.

It may be that the problem is not so much that large numbers of people dilute social and moral pressure, but that the impact of our actions becomes too diffuse to be able to appreciate when we make our individual life choices.  The counter-culture of many parts of the web may reflect, in part, the way in which aspects of the web can make the impact of small individual actions more clear to the individual and more accountable to others.

  1. Garrett Hardin, “The Tragedy of the Commons”, Science, Vol. 162, No. 3859 (December 13, 1968), pp. 1243-1248. … and here is the danger of citation counting as a quality metric, I am citing it because I disagree with it![back]
  2. Ian Angus. The Myth of the Tragedy of the Commons. Socialist Voice, August 24, 2008[back]
  3. Berners-Lee, T., Hall, W., Hendler, J. A., O’Hara, K., Shadbolt, N. and Weitzner, D. J. (2006) A Framework for Web Science. Foundations and Trends in Web Science, 1 (1). pp. 1-130.  http://eprints.ecs.soton.ac.uk/13347/[back]

Dhaval Vyas’ PhD

I was very pleased to be part of the committee for Dhaval Vyas‘ PhD defense last Friday.

Dhaval’s thesis “Designing for Awareness: An Experience-focused HCI Perspective1 is well worth a read with ethnographic studies of academics (!) and designers at work; and also technical interventions in both situations: Panorama a public screen photo-montage-style display and CAM a way to tag and discuss physical objects.

While reading the thesis I also realised that ‘awareness’  is one of those slippery words, that has a slightly technical meaning in CSCW, and one you sort of understand by example and diffusion, but is surprisingly hard to pin down. Dhaval does not have a precise definition and neither do I: the HCI textbook says “generally having some feeling for what other people are doing or have been doing” – hardly precise!  I have some half-formed thoughts on this, but will leave them for another post.

I first knew Dhaval when he was doing his MSc at Lancaster in 2001/2002.  He has always been dedicated to pursuing an academic career and it is wonderful 10 years on to see this come to fruition.  There have been a lot of barriers on the way and so this is a testament to Dhaval’s strength of character as well as intellectual attainment.

  1. Dhaval Vyas (2011). Designing for Awareness: An Experience-focused HCI Perspective. University of Twente. download thesis (PDF 4.9Mb). DOI: 10.3990/1.9789036531351 (not yet resolving).[back]

announcing Tiree Tech Wave!

Ever since I came to Tiree I’ve had a vision of bringing people here, to share some of the atmosphere and work together.  A few of you have come on research visits and we have had some really productive times.  Others have said they wished they could come sometime.

Well now is your chance …

Come to Tiree Tech Wave in March to make, talk and play at the wind-ripping edge of digital technology.

seascape

Every year Tiree hosts the Wave Classic, a key international wind surfing event.  Those of us at the edge of the digital wave do not risk cold seas and bodily injury, but there is something of the same thrill as we explore the limits of code, circuit boards and social computation.

iconsThe cutting edge of wind-surfing boards is now high technology, but typically made by artisan craftsfolk, themselves often surfers.  Similarly hardware platforms such as Arduino, mobile apps for iPhone and Android, and web mashups enabled by public APIs and linked data are all enabling a new maker culture, challenging the hegemony of global corporations.

artworkThe Western Celtic fringes were one of the oases of knowledge and learning during the ‘dark ages’.  There is something about the empty horizon that helped the hermit to focus on God and inspired a flowering of decorative book-making, even in the face of battering storms of winter and Viking attacks of summer; a starkness that gave scholars time to think in peace between danger-fraught travel to other centres of learning across Europe.

Nowadays regular Flybe flights and Calmac ferries reduce the risk of Viking attacks whilst travelling to the isles, broadband Internet and satellite TV invade the hermit cell, and double glazing and central heating mollify the elements.  Yet there is still a rawness that helps focus the mind, a slightly more tenuous connection to the global infrastructure that fosters a spirit of self-reliance and independence.

LEDsOver a long weekend 17 – 21 March (TBC), we plan what I hope will be a semi-regular event.  A time to step out, albeit momentarily, from a target-driven world, to experiment and play with hardware and software, to discuss the issues of our new digital maker culture, what we know and what we seek to understand, and above all to make things together.

This is all about technology and people: the physical device that sits in our hands, the data.gov.uk mashup that tells us about local crime, the new challenges to personal privacy and society and the nation state.

Bring your soldering iron, and Arduino boards, your laptop and API specs, your half-written theses and semi-formed ideas, your favourite book or even well-loved eReader (!).  The format will be informal, with lots of time to work hands-on together; however, there will be the opportunity for short talks/demos/how-to-do-it sessions.  Also, if there is demand, I’d  be happy to do some more semi-formal tutorial sessions and maybe others would too (Arduino making, linked data).

Currently we have no idea whether there will be three or three hundred people interested, but aiming for something like 15 – 30 participants.  We’ll keep costs down, probably around £70 for meeting rooms, lunches, etc. over the five days, but will confirm that and more details shortly.

Follow on Twitter at @tireetechwave and the website will be at tireetechwave.com. However, it is still ‘under development’, so don’t be surprised at the odd glich over the next couple of weeks as we sort out details.

If you are interested in coming or want to know more mail me or Graham Dean

Back to Tiree – and being ‘half-time’

I’m on the ferry on the way back to Tiree. It’s been 2 months since I was home and then only one long weekend since the end of August, so it seems both familiar and strange sitting on the Calmac ferry again as it makes its way out of Oban.

Last autumn I had a similar long stay away, then mostly in the camper van near the University as I was still working full time at Lancaster. This year I am working half time at Lancaster, but also half-time for Talis and for the first three months at Talis spending half my time on site at the Talis offices in Birmingham. After that I’ll be doing my Talis job based from home, only going down more occasionally, so after Christmas will get more time at home.

Instead of my camper van I’ve been staying a lot at the ‘Talis house’, a house near Solihull for small off-site meetings and for those like me who live a long way away from Talis’ Birmingham offices (others live in France, Italy, and the USA). It was rather claustrophobic last autumn spending most of my weekends in my office at Lancaster, so having Talis house as a base has been good. However, I do miss that snug feeling in the back of the camper van hunkering under the bedclothes, with a take-away on my knees and watching a DVD, while the van rocked in time to the whistling wind outside.

Working half time for Talis has also imposed a discipline on my time working in my University role. Since last Christmas I have been formally working half time at Lancaster (certainly getting half pay!), but as those who work in the universities know, it is hard to put a limit on things. The idea was that this meant I would get half my time to do ‘my stuff’, research and writing. Of course I knew cutting my old 80-hour weeks down to 20 or even 40 would not happen, but I would at least get a little more time than I have become used to.

One of the half expected and half surprising things about the shift to half-time working for the University last January was the way other people dealt with it.

I guess for years I have implicitly ‘educated’ both fellow academics and students in their expectations; whenever there was something to be done, a report to read or write, I would say things like “ah this weekend I’ve already got this other task to do, but I’ll do it the next weekend” — basically assuming that weekends and evenings, strictly the unpaid times, were the times when things happened. After a bit students would get used to giving me things on Friday in the expectation that I would then have time to do it.

When I shifted to half time people would extend this notion and say “ah now you have more time you can do X”: reviews, reading student work, etc. As I said this was half expected, I had the feeling I would need to re-educate people. However, what surprised me was not that people acted this way, but that they said it, and even wrote it in emails. I would have thought that when they saw it explicitly in front of them they would think, “oh no Alan now has less time for these things”, but no; it is amazing how little we notice of what we say and do.

Anyway now things are different. Instead of it being ‘my time’ that my academic life intruded into, it is now Talis’ time and this is something others can respect more, and I guess I also respect more than my own time.

So how is it working — really being a half-time academic?

In fact of course, I still work most weekends and long days, so I have somewhat more than a full-time week of effort, so I am not yet down to 20 hours of university work, but certainly a lot less time then when I was simply trying to protect my own (unpaid!) time.

In January when I shifted to half-time, I said I’d do a day a week while at home effectively eating nearly half of my ‘half time’, meaning I was expecting to spend about 60 days a year away from home whether on site in Lancaster or travelling. In fact during this Autumn alone, by Christmas I will have spent 53 days either on site in Lancaster or travelling on University business, that is more than 2/3 of the formal 75 working days in the period and nearly all my annual ‘not at home’ Lancs working days! This doesn’t seem to add up given 1/2 time spent in B’ham, but of course the 53 days of Lancaster time includes many weekends away while travelling that I wasn’t used to counting when a ‘full time’ academic.

I clearly need to cut this down further! However, even now, being stricter than I was with ‘my time’, cracks are beginning to show. I can see students getting unhappy as it takes me longer to find time to read things they have written, and colleagues patiently realising that email to me is getting even less reliable. So much of the life of an academic depends on things done in ‘extra time’ whether weekends or evenings, or in my case earlier in the year unpaid time; when you cut back on that things simply do not happen.

From Christmas I will not have the imposed discipline of days at the offices at Talis, so will need to maintain this more for myself. However, the last few months have helped and I will certainly keep careful records to make sure Talis gets its fair share of my time and that the University does not consume so much of my ‘own’ time as rest is also part of working well.

Even though I have effectively ‘used up’ most of my university on-site/travelling days, I will of course not say “no more until next September’ (!), but will at least try to control it more. And I will also try to let some of the more balanced view of work and life I am learning at Talis influence my attitudes at the University.

And no, I won’t be reading email this evening.