Wikipedia blackout and why SOPA winging gets up my nose

Posted on January 18, 2012 by alan

Nobody on the web can be unaware of the Wikipedia blackout, and if they haven’t heard of SOPA or PIPA before will have now. Few who understand the issues would deny that SOPA and PIPA are misguided and ill-informed, even Apple and other software giants abandoned it, and Obama’s recent statement has effectively scuppered SOPA in its current form. However, at the risk of apparently annoying everyone, am I the only person who finds some of the anti-SOPA rhetoric at best naive and at times simply arrogant?

The ignorance behind SOPA and a raft of similar legislation and court cases across the world is deeply worrying. Only recently I posted about the recent NLA case in the UK, that creates potential copyright issues when linking on the web reminiscent of the Shetland Times case nearly 15 years ago.

However, that is no excuse for blinkered views on the other side.

I got particularly fed up a few days ago reading an article “Lockdown: The coming war on general-purpose computing”¹ by copyright ativist Cory Doctorow based on a keynote he gave at the Chaos Computer Congress. The argument was that attempts to limit the internet destroyed the very essence of the computer as a general purpose device and were therefore fundamentally wrong. I know that Sweden has just recognised Kopimism as a religion, but still an argument that relies on the inviolate nature of computation leaves one wondering.

The article also argued that elected members of Parliament and Congress are by their nature layfolk, and so quite reasonably not expert in every area:

“And yet those people who are experts in policy and politics, not technical disciplines, still manage to pass good rules that make sense.“

Doctorow has trust in the nature of elected democracy for every area from biochemistry to urban planning, but not information technology, which, he asserts, is in some sense special.

Now even as a computer person I find this hard to swallow, but what would a geneticist, physicist, or even a financier using the Black-Scholes model make of this?

Furthermore, Congress is chastised for finding unemployment more important than copyright, and the UN for giving first regard to health and economics — of course, any reasonable person is expected to understand this is utter foolishness. From what parallel universe does this kind of thinking emerge?

Of course, Doctorow takes an extreme position, but the Electronic Freedom Foundation’s position statement, which Wikipedia points to, offers no alternative proposals and employs scaremongering arguments more reminiscent of the tabloid press, in particular the claim that:

“venture capitalists have said en masse they won’t invest in online startups if PIPA and SOPA pass“

This turns out to be a Google sponsored report² and refers to “digital content intermediaries (DCIs)“, those “search, hosting, and distribution services for digital content“, not startups in general.

When this is the quality of argument being mustered against SOPA and PIPA is there any wonder that Congress is influenced more by the barons of the entertainment industry?

Obviously some, such as Doctorow and more fundamental anti-copyright activists, would wish to see a completely unregulated net. Indeed, this is starting to be the case de facto in some areas, where covers are distributed pretty freely on YouTube without apparently leading to a collapse in the music industry, and offering new bands much easier ways to make an initial name for themselves. Maybe in 20 years time Hollywood will have withered and we will live off a diet of YouTube videos :-/

I suspect most of those opposing SOPA and PIPA do not share this vision, indeed Google has been paying 1/2 million per patent in recent acquisitions!

I guess the idealist position sees a world of individual freedom, but it is not clear that is where things are heading. In many areas online distribution has already resulted in a shift of power from the traditional producers, the different record companies and book publishers (often relatively large companies themselves), to often one mega-corporation in each sector: Amazon, Apple iTunes. For the latter this was in no small part driven by the need for the music industry to react to widespread filesharing. To be honest, however bad the legislation, I would rather trust myself to elected representatives, than unaccountable multinational corporations³.

If we do not wish to see poor legislation passed we need to offer better alternatives, both in terms of the law of the net and how we reward and fund the creative industries. Maybe the BBC model is best, high quality entertainment funded by the public purse and then distributed freely. However, I don’t see the US Congress nationalising Hollywood in the near future.

Of course copyright and IP is only part of a bigger picture where the net is challenging traditional notions of national borders and sovereignty. In the UK we have seen recent cases where Twitter was used to undermine court injunctions. The injunctions were in place to protect a few celebrities, so were ‘fair game’ anyway, and so elicited little public sympathy. However, the Leveson Inquiry has heard evidence from the editor of the Express defending his paper’s suggestion that the McCann’s may have killed their own daughter; we expect and enforce (the Expresss paid £500,000 after a libel case) standards in the print media, would we expect less if the Express hosted a parallel new website in the Cayman Islands?

Whether it is privacy, malware or child pornography, we do expect and need to think of ways to limit the excess of the web whilst preserving its strengths. Maybe the solution is more international agreements, hopefull not yet more extra-terratorial laws from the US⁴.

Could this day without Wikipedia be not just a call to protest, but also an opportunity to envision what a better future might be.

blanked out today, see Google cache[back]
By Booz&Co, which I thought at first was a wind-up, but appears to be a real company![back]
As I write this, I am reminded of the corporation-controlled world of Rollerball and other dystopian SciFi.[back]
How come there is more protest over plans to shut out overseas web sites than there is over unmanned drones performing extra-judicial executions each week.[back]

tread lightly — controlling user experience pollution

Posted on January 14, 2012 by alan

When thinking about usability or user experience, it is easy to focus on the application in front of us, but the way it impacts its environment may sometimes be far more critical. However, designing applications that are friendly to their environment (digital and physical) may require deep changes to the low-level operating systems.

I’m writing this post effectively ‘offline’ into a word processor for later upload. I sometimes do this as I find it easier to write without the distractions of editing within a web browser, or because I am physically disconnected from the Internet. However, now I am connected, and indeed I can see I am connected as a FTP file upload is progressing, it is just that anything else network-related is stalled.

The reason that the FTP upload is ‘hogging’ the network is, I believe, due to a quirk in the UNIX scheduling system, which was, paradoxically, originally intended to improve interactivity.

UNIX, which sits underneath Mac OS, is a multiprocessing operating system running many programs at once. Each process has a priority, called its ‘niceness‘, which can be set explicitly, but is also tweaked from moment to moment by the operating system. One of the rules for ‘tweaking’ it is that if a process is IO-bound, that is if it is constantly waiting for input or output, then its niceness is decreased, meaning that it is given higher priority.

The reason for this rule is partly to enhance interactive performance in the old days of command line interfaces; an interactive program would spend lots of time waiting for the user to enter something, and so its priority would increase meaning it would respond quickly as soon as the user entered anything. The other reason is that CPU time was seen as the scarce resource, so that processes that were IO bound were effectively being ‘nicer’ to other processes as they let them get a share of the precious CPU.

The FTP program is simply sitting there shunting out data to the network, so is almost permanently blocked waiting for the network as it can read from the disk faster than the network can transmit data. This means UNIX regards it as ‘nice’ and ups its priority. As soon as the network clears sufficiently, the FTP program is rescheduled and it puts more into the network queue, reads the next chunk from disk until the network is again full to capacity. Nothing else gets a chance, no web, no email, not even a network trace utility.

I’ve seen the same before with a database server on one of Fiona’s machines — all my fault. In the MySQL manual it suggested that you disable indices before large bulk updates (e.g. ingesting a file of data) and then re-enable them once the update is finished as indexing is more efficient on lots of data than one at a time. I duly did this and forgot about it until Fiona noticed something was wrong on the server and web traffic had ground to a near halt. When she opened a console on the server, she found that it seemed quiet, very little CPU load at all, and was puzzled until I realised it was my indexing. Indexing requires a lot of reading and writing data to and from disk, so MySQL became IO-bound, was given higher priority, as soon as the disk was free it was rescheduled, hit the disk once more … just as FTP is now hogging the network, MySQL hogged the disk and nothing else could read or write. Of course MySQL’s own performance was fine as it internally interleaved queries with indexing, it is just everything else on the system that failed.

These are hard scenarios to design for. I have written before (“why software need never hang“) about the way application designers do not think sufficiently about potential delays due to slow networks, or broken connections. However, that was about the applications that are suffering. Here the issue is not that the FTP program is badly designed for its delays, it is still responding very happily, just that it has had a knock on effect on the rest of the system. It is like cleaning your sink with industrial bleach — you have a clean house within, but pollute the watercourse without.

These kind of issues are not related solely to network and disk, any kind of resource is limited and profligacy causes damage in the digital world as much as in the physical environment.

Some years ago I had a Symbian smartphone, but it proved unusable as its battery life rarely exceeded 40 minutes from full charge. I thought I had a duff battery, but later realised it was because I was leaving applications on the phone ‘open’. For me I went to the address book, looked up a number, and that was that, I then maybe turned the phone off or switched to something else without ‘exiting’ the address book. I was treating the phone like every previous phone I had used, but this one was different, it had a ‘real’ operating system, opening the address book launched the address book application, which then kept on running — and using power — until it was explicitly closed, a model that is maybe fine for permanently plugged in computers, but disastrous for a moble phone.

When early iPhones came out iOS was criticised for being single threaded, that is not having lots of things running in the ‘background’. However, this undoubtedly helped its battery life. Now, with newer versions of iOS, it has changed and there are lots of apps running at once, and I have noticed the battery life reducing, is that simply the battery wearing out with age or the effect of all those apps running?

Power is of course not just a problem for smartphones, but for any laptop. I try to closedown applications on my Mac when I am working without power as I know some programs just eat CPU when they are apparently idle (yes, Firefox, it’s you I’m talking about). And from an environmental point of view, lower power consumption when connected would also be good. My hope was that Apple would take the lessons learnt in the early iOS to change the nature of their mainstream OS, but sadly they succumbed to the pressure to make iOS a ‘proper’ OS!

Of course the FTP program could try to be friendly, perhaps when it is not the selected window deliberately throttle its network activity. But then the 4 hour upload would take 8 hours, instead of 20 minutes left at this point, I’d be looking forward to another 4 hours and 20 minutes, and I’d be complaining about that.

The trouble is that there needs to be better communication, more knowledge shared, between application and operating system. I would like FTP to use all the network capacity that it can, except when I am interacting with some other program. Either FTP needs to say to the OS “hey here’s a packet, send it when there’s a gap”¹, or the OS needs some way for applications to determine current network state and make decisions based on that. Sometimes this sort of information is easily available, more often it is either very hard to get at or not available at all.

I recall years ago when internet was still mainly through pay-per-minute dial-up connections. You could set your PC to automatically dial when the internet was needed. However, some programs, such as chat, would periodically check with a central server to see if there was activity, this would cause the PC to dial-up the ISP. If you were lucky the PC also had an auto-disconnect after a period of inactivity, if you were not lucky the PC would connect at 2am and by the morning you’d find yourself with a phone bill more than your weeks’ wages.

When we were designing onCue at aQtive, we wanted to be able to connect to the Internet when it was available, but avoid bankrupting our users. Clearly somewhere in the TCP/IP stack, the layers of code over the network, at some level deep down it knew whether we were connected. I recall we found a very helpful function in the Windows API called something like “isConnected”². Unfortunately, it worked by attempting to send a network packet and returning true if it succeeded and false if it failed. Of course sending the test packet caused the PC to auto-dial …

And now there is just 1 minute and 53 seconds left on the upload, so time to finish this post before I get on to garbage collection.

This form of “send when you can” would also be useful in cellular networks, for example when syncing photos.[back]
I had a quick peek, and fund that Windows CE has a function called InternetGetConnectedState. I don’t know if this works better now.[back]

changing rules of copyright on the web – the NLA case

Posted on December 26, 2011 by alan

I’ve been wondering about the broader copyright implications of a case that went through the England and Wales Court of Appeal earlier this year. The case was brought by the NLA (Newspaper Licensing Agency) against Meltwater, who run commercial media-alert services; for example telling you or your company when and where you have been mentioned in the press.

While the case is specifically about a news service, it appears to have broader implications for the web, not least because it makes new judgements on:

the use of titles/headlines — they are copyright in their own right
the use of short snippets (in this case no more than 256 characters) — they too potentially infringe copyright
whether a URL link is sufficient acknowledgement of copyright material for fair use – it isn’t!

These, particularly the last, seems to have implications for any form of publicly available lists, bookmarks, summaries, or even search results on the web. While NLA specifically allow free services such as Google News and Google Alerts, it appears that this is ‘grace and favour’, not use by right. I am reminded of the Shetland case¹, which led to many organisations having paranoid policies regarding external linking (e.g. seeking explicit permission for every link!).

So, in the UK at least, web law copyright law changed significantly through precedent, and I didn’t even notice at the time!

In fact, the original case was heard more than a year ago November 2010 (full judgement) and then the appeal in July 2011 (full judgement), but is sufficiently important that the NLA are still headlining it on their home page (see below, and also their press releases (PDF) about the original judgement and appeal). So effectively things changed at least at that point, although as this is a judgement about law, not new legislation, it presumably also acts retrospectively. However, I only recently became aware of it after seeing a notice in The Times last week – I guess because it is time for annual licences to be renewed.

Newspaper Licensing Agency (home page) on 26th Dec 2011

The actual case was, in summary, as follows. Meltwater News produce commercial media monitoring services, that include the title, first few words, and a short snippet of news items that satisfy some criteria, for example mentioning a company name or product. NLA have a license agreement for such companies and for those using such services, but Meltwater claimed it did not need such a license and, even if it did, its clients certainly did not require any licence. However, the original judgement and the appeal found pretty overwhelmingly in favour of NLA.

In fact, my gut feeling in this case was with the NLA. Meltwater were making substantial money from a service that (a) depends on the presence of news services and (b) would, for equivalent print services, require some form of licence fee to be paid. So while I actually feel the judgement is fair in the particular case, it makes decisions that seem worrying when looked at in terms of the web in general.

Summary of the judgement

The appeal supported the original judgement so summarising the main points from the latter (indented text quoting from the text of the judgement).

Headlines

The status of headlines (and I guess by extension book titles, etc.) in UK law are certainly materially changed by this ruling (para 70/71), from previous case law (Fairfax, Para. 62).

Para. 70. The evidence in the present case (incidentally much fuller than that before Bennett J in Fairfax -see her observations at [28]) is that headlines involve considerable skill in devising and they are specifically designed to entice by informing the reader of the content of the article in an entertaining manner.

Para. 71. In my opinion headlines are capable of being literary works, whether independently or as part of the articles to which they relate. Some of the headlines in the Daily Mail with which I have been provided are certainly independent literary works within the Infopaq test. However, I am unable to rule in the abstract, particularly as I do not know the precise process that went into creating any of them. I accept Mr Howe’s submission that it is not the completed work as published but the process of creation and the identification of the skill and labour that has gone into it which falls to be assessed.

Links and fair use

The ruling explicitly says that a link is not sufficient acknowledgement in terms of fair use:

Para. 146. I do not accept that argument either. The Link directs the End User to the original article. It is no better an acknowledgment than a citation of the title of a book coupled with an indication of where the book may be found, because unless the End User decides to go to the book, he will not be able to identify the author. This interpretation of identification of the author for the purposes of the definition of “sufficient acknowledgment” renders the requirement to identify the author virtually otiose.

Links as copies

Para 45 (not part of the judgement, but part of NLA’s case) says:

Para. 45. … By clicking on a Link to an article, the End User will make a copy of the article within the meaning of s. 17 and will be in possession of an infringing copy in the course of business within the meaning of s. 23.

The argument here is that the site has some terms and conditions that say it is not for ‘commercial user’.

As far as I can see the judge equivocates on this issue, but happily does not seem convinced:

Para 100. I was taken to no authority as to the effect of incorporation of terms and conditions through small type, as to implied licences, as to what is commercial user for the purposes of the terms and conditions or as to how such factors impact on whether direct access to the Publishers’ websites creates infringing copies. As I understand it, I am being asked to take a broad brush approach to the deployment of the websites by the Publishers and the use by End Users. There is undoubtedly however a tension between (i) complaining that Meltwater’s services result in a small click-through rate (ii) complaining that a direct click to the article skips the home page which contains the link to the terms and conditions and (iii) asserting that the End Users are commercial users who are not permitted to use the websites anyway.

Free use

Finally, the following extract suggests that NLA would not be seeking to enforce the full licence on certain free services:

Para. 20. The Publishers have arrangements or understandings with certain free media monitoring services such as Google News and Google Alerts whereby those services are currently licensed or otherwise permitted. It would apparently be open to the End Users to use such free services, or indeed a general search engine, instead of a paid media monitoring service without (currently at any rate) encountering opposition from the Publishers. That is so even though the End Users may be using such services for their own commercial purposes. The WEUL only applies to customers of a commercial media monitoring service.

Of course, the fact that they allow it without licence, suggests they feel the same copyright rules do apply, that is the search collation services are subject to copyright. The judge does not make a big point of this piece of evidence in any way, which would suggest that these free services do not have a right to abstract and link. However, the fact that Meltwater (the agency NA is acting against) is making substantial money was clearly noted by the judge, as was the fact that users could choose to use alternative services free.

Thinking about it

As noted my gut feeling is that fairness goes to the newspapers involved; news gathering and reportingis costly, and openly accessible online newspapers are of benefit to us all; so, if news providers are unable to make money, we all lose.

Indeed, years ago in dot.com days, at aQtive we were very careful that onCue, our intelligent internet sidebar, did not break the business models of the services we pointed to. While we effectively pre-filled forms and submitted them silently, we did not scrape results and present these directly, but instead sent the user to the web page that provided the information. This was partly out a feeling that this was the right and fair thing to do, partly because if we treated others fairly they would be happy for us to provide this value-added service on top of what they provided, and partly because we relied on these third-party services for our business, so our commercial success relied on theirs.

This would all apply equally to the NLA v. Meltwater case.

However, like the Shetland case all those years ago, it is not the particular of the case that seems significant, but the wide ranging implications. I, like so many others, frequently cite web materials in blog posts, web pages and resource lists by title alone with the words live and pointing to the source site. According to this judgement the title is copyright, and even if my use of it is “fair use” (as it normally would be), the use of the live link is NOT sufficient acknowledgement.

Maybe, things are not quite so bad as they seem. In the NLA vs. Meltwater case, the NLA had a specific licence model and agreement. The NLA were not seeking retrospective damages for copyright infringement before this was in place, merely requiring that Meltwater subscribe fully to the licence. The issue was not that just that copyright had been infringed, but that it had been when there was a specific commercial option in place. In UK copyright law, I believe, it is not sufficient to say copyright has been infringed, but also to show that the copyright owner has been materially disadvantaged by the infringement; so, the existence of the licence option was probably critical to the specific judgement. However the general principles probably apply to any case where the owner could claim damage … and maybe claim so merely in order to seek an out-of-court settlement.

This case was resolved five months ago, and I’ve not heard of any rush of law firms creating vexatious copyright claims. So maybe there will not be any long-lasting major repercussions from the case … or maybe the storm is still to come.

Certainly, the courts have become far more internet savvy since the 1990s, but judges can only deal with the laws they are give, and it is not at all clear that law-makers really understand the implications of their legislation on the smooth running of the web.

This was the case in the late 1990s where the Shetland Times sued the Shetland News for including links to its articles. Although the particular case involved material that appeared to be re-badged, the legal issues endangered the very act of linking at all. See NUJ Freelance “NUJ still supports Shetland News in internet case“, BBC “Shetland Internet squabble settled out of court“, The Lawyer “Shetland Internet copyright case is settled out of court“[back]

The Great Apple Apartheid

Posted on November 4, 2011 by alan

In days gone by boarding houses and shops had notices saying “Irish and Blacks not welcome“. These days are happily long past, but today Apple effectively says “poor and rural users not welcome“.

This is a story about Apple and the way its delivery policies exacerbate the digital divide and make the poor poorer. To be fair, similar stories can be told about other software vendors, and it is hardly news that success in business is often at the expense of the weak and vulnerable. However, Apple’s decision to deliver Lion predominantly via App store is an iconic example of a growing problem.

I had been using Lion for a little over a week, not downloaded from App Store, but pre-installed on a brand new MacBook Air. However, whenever I plugged in my iPhone and tried to sync a message appeared saying the iTunes library was created with a newer version of iTunes and so iTunes needed to be updated. Each time I tried to initiate the update as requested, it started a long slow download dialogue, but some time later told me that the update had failed.

This at first seemed all a little odd on a brand new machine, but I think the reason is as follows:

When I first initialised the new Air I chose to have it sync data with a Time Machine backup from my previous machine.
The iTunes on the old machine was totally up-to-date due to regular updates.
Apple dealers do not bother to update machines before they are delivered.
The hotel WiFi connection did not have sufficient throughput for a successful update.

From an engineering point of view, the fragility of the iTunes library format is worrying; many will recall the way HyperCard was able to transfer stacks back and forth between versions without loss. Anyway the paucity of engineering in recent software is a different story!

It is the fact that the hotel WiFi was in sufficient for the update that concerns me here. It was fast enough to browse the web, without apparent delay, to check email etc. Part of the problem was that the hotel did offer two levels of service, one (more expensive!) aimed more at heavy multimedia use, so maybe that would have been sufficient. The essential update for the brand new machine consisted of 1.46 gigabytes of data, so perhaps not surprising the poor connection faltered.

I have been concerned for several years at the ever increasing size of regular software updates, which have increased from 100 Mbytes to now often several Gbytes¹. Usually these happen in the background and I have reasonable broadband at home, so they don’t cause me any problems personally, but I wonder about those with less good broadband, or those whose telephone exchanges do not support broadband at all. In the UK, this is mainly those outside major urban areas, who are out of reach of cable and fibre super-broadband and reliant on old BT copper lines. Thinking more broadly across the world, how many in less developed countries or regions will be able to regularly update software?

Of course old versions may well run better on old computers, but without updates it is not just that users cannot benefit from new features, but more critically they are missing essential security updates leaving the vulnerable to attack.

And this is not just a problem for those directly affected, but for us all, as it creates a fertile ground for bot armies to launch denial of service attacks and other forms of cybercrime or cyberterrorism. Each compromised machine is a future cyberwarrior or cybergangster.

However, the decision of Apple to launch Lion predominantly via App Store has significantly upped the stakes. Those with slower broadband connections may be able to manage updates, but the full operating system is an order of magnitude larger. Of course those with slower connections tend to be the poorer, more vulnerable, more marginalised; those without jobs, in rural areas, the elderly. It is as if Apple has put up a big notice:

“To the poor and weak
we don’t want you“

To be fair, Lion is (one feels grudgingly) also made available on USB drives, but at more than twice the price of the direct download². So this is not entirely shutting the door on the poor, but only letting them in if they pay extra. A tax on poverty.

Of course, this is not a deliberate act of aggression against the weak, just the normal course of business. The cheapest and easiest way to deliver software, and one that incidentally ensures that all revenue goes to Apple, is through direct online sales. The USB option adds complexity and cost to the distribution systems and Apple seem to be pricing to discourage use. This, like so many other ways in which the poor pay more, is just an ‘accident’ of the market economy.

But for a company that prides itself in design, surely things could be done more creatively?

One way would be to split software into two parts. One small part would be the ‘key’, essential to run it, but very small, The second part would constitute the bulk of the software, but be unusable without the ‘key’. The ‘key’ would then be sold solely on the App store, but would be small enough for anyone to download. The rest would be also made available online, but for free download and with a licence that allows third party distribution (and of course be suitably signed/encrypted to prevent tampering). Institutions or cybercafes could download it to local networks, entrepreneurs could sell copies on DVD or USB, but competition would mean this would be likely to end up far cheaper than Apple’s USB premium, close to the cost of the medium, with a small margin.

Of course the same method could be used for any software, not just Lion, and indeed even for software updates.

I’m sure Apple could think of alternative, maybe better, solutions. The problem is just that Apple’s designers, despite inordinate consideration for the appearance and appeal of their products, have simply not thought beyond the kind of users they meet in the malls of Cupertino.

Note, this is not an inevitable consequence of increasing complexity and (itself lamentable) code bloat. In the past software updates were often delivered as ‘deltas’, the changes between old and new. It seems that now an ‘update’ is in fact complete copies of entire major components.[back]
At the tiem of wrting tjis Mac OSX LIon is available for app store for $29.99, but USB thumb drive version is $69.99[back]

TTW2 – the second Tiree Tech Wave is approaching

Posted on September 28, 2011 by alan

It is a little over a month (3-7 Nov) until the next Tiree Tech Wave 🙂 However, as I’m going to be off-island most of the time until the end of October, it seems very close indeed!

The first registrations are in, including Clare flying straight here from the US¹ and Alessio coming from Madrid; mind you last time Azizah had come all the way from Malaysia, so still looking very parochial in comparison!

While I don’t expect we will be oversubscribed, do ‘book early’ (before Oct 10th) if you intend to come to help us plan things and make sure you get your preferred accommodation (the tent at the end of my garden is draughty in November) and travel.

If you want to take advantage of the island’s watersports, catch me in one place for more than a day, or simply hang out, do take a few extra days before or after the event. One person has already booked to arrive a couple of days early and others maybe also.

To see what the Tech Wave will be like see the Interfaces report … although it is the people who make the event, so I’m waiting to be surprised again this time round 🙂

Looking forward to seeing you.

In fact guided over the ocean by the ‘golf ball’ on Tiree, which is the North Atlantic civil radar.[back]

book: The Laws of Simplicty, Maeda

Posted on July 23, 2011 by alan

Yesterday I started to read John Maeda’s “The Laws of Simplicty” whilst sitting by Fiona’s stall at the annual Tiree agricultural show, then finished before breakfast today. Maeda describes his decision to cap at 100 pages¹ as something that could be read during a lunch break. To be honest 30,000 words sounds like a very long lunch break or a very fast reader, but true to his third law, “savings in time feel like simplicity”², it is a short read.

The shortness is a boon that I wish many writers would follow (including me). As with so many single issue books (e.g. Blink), there is s slight tendency to over-sell the main argument, but this is forgiveable in a short delightful book, in a way that it isn’t in 350 pages of less graceful prose.

I know I have a tendency, which can be confusing or annoying, to give, paradoxically for fear of misunderstanding, the caveat before the main point. Still, despite knowing this, in the early chapters I did find myself occasionally bristling at Maeda’s occasional overstatement (although in accordance with simplicity, never hyperbole).

One that particularly caught my eye was Maeda’s contrast of the MIT engineer’s RFTM (Read The F*cking Manual) with the “designer’s approach” to:

marry function with form to create intuitive experiences that we understand immediately.

Although in principle I agree with the overall spirit, and am constantly chided by Fiona for not reading instructions³, the misguided idea that everything ought to ‘pick up and use’ has bedeviled HCI and user interface design for at least the past 20 years. Indeed this is the core misconception about Heidegger’s hammer example that I argued against in a previous post “Struggling with Heidegger“. In my own reading notes, my comment is “simple or simplistic!” … and I meant here the statement not the resulting interfaces, although it could apply to both.

It has always been hard to get well written documentation, and the combination of single page ‘getting started’ guides with web-based help, which often disappears when the web site organisation changes, is an abrogation of responsibility by many designers. Not that I am good at this myself. Good documentation is hard work. It used to be the coders who failed to produce documentation, but now the designers also fall into this trap of laziness, which might be euphemistically labelled ‘simplicity’⁴.

Personally, I have found that the discipline of documenting (in the few times I have observed it!) is in fact a great driver of simple design. Indeed I recall a colleague, maybe Harold Thimbleby⁵, once suggested that documentation ought to be written before any code is written, precisely to ensure simple use.

Some years ago I was reading a manual (for a Unix workstation, so quite a few years ago!) that described a potentially disastrous shortcoming of a the disk sync command (which could have corrupted the disk). Helpfully the manual page included a suggestion of how to wrap sync in scripts that prevented the problem. This seemed to add insult to injury; they knew there was a serious problem, they knew how to fix it … and they didn’t do it. Of course, the reason is that manuals are written by technical writers after the code is frozen.

In contrast, I was recently documenting an experimental API⁶ so that a colleague could use it. As I wrote the documentation I found parts hard to explain. “It would be easier to change the code”, I thought, so I did so. The API, whilst still experimental, is now a lot cleaner and simpler.

Coming back to Maena after a somewhat long digression (what was that about simplicity and brevity?). While I prickled slightly at a few statements, in fact he very clearly says that the first few concrete ‘laws’ are the simpler (and if taken in their own simplistic), the later laws are far more nuanced and suggest deeper principles. This includes law 5 “differences: simplicity and complexity need each other”, which suggest that one should strive for a dynamic between simplicity and complexity. This echoes the emphasis on texture I often advocate when talking with students; whether in writing, presenting or in experience design it is often the changes in voice, visual appearance, or style which give life.

the simplest interface?

I wasn’t convinced by Maeda’s early claim that simple designs were simpler and cheaper to construct. Possibly true for physical prodcuts, but rarely so for digital interfaces, where more effort is typically needed in code to create simpler user interfaces. However, again this was something that was revisited later, especially in the context of more computationally active systems (“law 8, in simplicity we trust”), where he contrasts “how much do you need to know about a system?” with “how much does the system know about you?”. The former is the case of more traditional passive systems, whereas more ‘intelligent’ systems such as Amazon recommendations (or even Facebook news feed) favour the latter. This is very similar to the principles for incidental and low-intention interaction that I have discussed in the past⁷.

Finally “The Laws of Simplicity” is beautifully designed in itself. It includes many gems not least those arising from Maeda’s roots in Japanese design culture, including aichaku, the “sense of attachment one can feel for an artefact” (p.69) and omakase meaning “I leave it to you”, which asks the sushi chef to create a meal especially for you (p.76). I am perhaps too much of a controller to feel totally comfortable with the latter, but Maeda’s book certainly inspires the former.

In fact there are 108 pages in the main text, but 9 of these are full page ‘law/chapter’ frontispieces, so 99 real pages. However, if you include the 8 page introduction that gives 107 … so even the 100 page cap is perhaps a more subtle concept than a strict count.[back]
See his full 10 laws of simplicity at lawsofsimplicity.com[back]
My guess is that the MIT engineers didn’t read the manuals either.[back]
Apple is a great — read poor — example here as it relies on keen technofreaks to tell others about the various hidden ways to do things — I guess creating a Gnostic air to the devices.[back]
Certainly Harold was a great proponent of ‘live’ documentation, both Knuth’s literate programming and also documentation that incorporated calculated input and output, rather like dexy, which I reported after last autumn’s Web Art/Science camp.[back]
In fairness, the API had been thrown together in haste for my own use.[back]
See ‘incidental interaction” and HCI book chapter 18.[back]

Six weeks on the road

Posted on July 20, 2011 by alan

I’ve been at home for the last week after six weeks travelling around the UK and elsewhere. I’ve not kept up while on the road so doing a retrospective post on it all and need to try to catch on other half written posts.

As well as time at Talis offices in B’ham and at Lancs (including exam board week), travels have taken me to Pisa for a workshop on ‘Supportive User Interfaces’, to Koblenz for Web Science conference giving a talk on embodiment issues and a poster on web-scale reasoning , to Newcastle for British HCI conference doing a talk on fridge, to Nottingham to give a talk on extended episodic experience, and back to Lancs for a session on creativity! Why can’t I be like sensible folks and talk on one topic!

Supportive User Interfaces

Monday 13th June I attended a workshop in Pisa on “Supportive User Interfaces“, which includes interfaces that adapt in various ways to users. The majority of people there were involved in various forms of model-based user interfaces in which various models of the task, application and interaction are used to generate user interfaces on the fly. W3C have had a previous group in this area; Dave Raggett from w3c was at the workshop and it sounds like there will be a new working group soon. This clearly has strong links to various forms of ‘meta-level’ representations of data, tasks, etc.. My own contribution started the day, framing the area, focusing partly on reasons for having more ‘meta-level’ interfaces including social empowerment, and partly on the principles/techniques that need to be considered at a human level.

Also on Monday was a meeting of IFIP Working Group 2.7/13.4. IFIP is the UNESCO founded pan-national agency that national computer societies such as as the BCS in the UK and ACM and IEEE Computer in the US belong to. Working Group 2.7/13.4 is focused on the engineering of user interfaces. I had been actively involved in the past, but have had many years’ lapse. However, this seemed a good thing to re-engage with with my new Talis hat on!

SUI: paper:

Opening the Box: Meta-level Interfaces Needs and Solutions

Web Science Conference in Koblenz

Jaime Teevan from Microsoft gave the opening keynote at WebSci 2011. I know her from her earlier work on personal information management, but her recent work and keynote was about work on analysing and visualising changes in web pages. Web page changes are also analysed alongside users re-visitation patterns; by looking at the frequency of re-visitation Jaime and her colleagues are able to identify the parts of pages that change with similar frequency, helping them, inter alia, to improve search ranking.

Had many great conversations, some with people I know previously (e.g. the Southampton folks), but also new, including the group at Troy that do lots of work with data.gov. I was particularly interested in some work using content matching to look for links between otherwise unlinked (or only partly inter-linked) datasets. Also lots of good presentations including one on trust prediction and a fantastic talk by Mark Bernstein from Eastgate, which he delivered in blank verse!

My own contribution included the poster that Dave@Talis prepared, which was on the web-scale spreading activation work in collaboration with Univ. Athens. Quite a niche area in a multi-disciplinary conference, so didn’t elicit quite the interest of the social networking posters, but did lead to a small number of in depth discussions.

In addition I gave talk on the more cognitive/philosophical issues when we start to use the web as an external extension to / replacement of memory, including its impact on education. Got some good feedback from this.

Closing keynote was from Barry Wellman, the guy who started social network analysis way before they were on computers. At one point he challenged the Dunbar number¹. I wondered whether this was due to cognitive extension with address books etc., but he didn’t seem to think so; there is evidence that some large circles predate web (although maybe not physical address books). Made me wonder about itinerant tradesmen, tinkers, etc., even with no prostheses. Maybe the numbers sort of apply to any single content, but are repeated for each new context?

WebSci papers:

The HCI Conference – Newcastle

I attended the British HCI conference in Newcastle. This was the 25th conference, and as my very first academic paper in computing² was at the first BHCI in 1984, I was pleased to be there at this anniversary. The paper I was presenting was a retrospective on vfridge, a social networking site dating back to 1999/2000, it seemed an historic occasion!

As is always the case presentations were all interesting. Strictly BHCI is a ‘second tier’ conference compared with CHI, but why is it that the papers are always more interesting, that I learn more? It is likely that a fair number of papers were CHI rejects, so it should be the other way round – is it that selectivity and ‘quality’ inevitably become conservative and boring?

Gregory Abowd gave the closing keynote. It was great to see Gregory again, we meet too rarely. The main focus of his keynote was on three aspects of research: novelty, value and reliability and how his own work had moved within this space over the years. In particular having two autistic sons has led him in directions he would never have considered, and this immediately valuable work has also created highly novel research. Novelty and value can coexist.

Gregory also reflected on the BHCI conference as it was his early academic ‘home’ when he did his PhD and postdoctoral here in the late 1980’s. He thought that it could be rather than, as with many conferences, a second best to getting a CHI paper, instead a place for (not getting the quote quite perfect) “papers that should get into CHI”, by which he meant a proving ground for new ideas that would then go on to be in CHI.

However I initially read the quote differently. BHCI always had a broader concept of HCI compare with CHI’s quite limited scope. That is BHCI as a place that points the way for the future of HCI, just as it was the early nurturing place of MobileHCI. However CHI has now become much broader in it’s own conception, so maybe this is no longer necessary. Indeed at the althci session the organisers said that their only complaint was that the papers were not ‘alt’ enough – that maybe ‘alt’ had become mainstream. This prompted Russell Beale to suggest that maybe althci should now be real science such as replication!

Gregory also noted the power of the conference as a meeting ground. It has always been proud of the breadth of international attendance, but perhaps it is UK saturation that should be it’s real measure of success. Of course the conference agenda has become so full and international travel so much cheaper than it was, so there is a tendency to go to the more topic specific international conferences and neglect the UK scene. This is compounded by the relative dearth of small UK day workshops that used to be so useful in nurturing new researchers.

I feel a little guilty here as this was the first BHCI I had been to since it was in Lancaster in 2007 … as Tom McEwan pointed out I always apologise but never come! However, to be fair I have also only been twice to CHI in the last 10 years, and then when it was in Vienna and Florence. I have just felt too busy, so avoiding conferences that I did not absolutely have to attend.

In response to Gregory’s comments, someone, maybe Tom, mentioned that in days of metrics-based research assessment there was a tendency to submit one’s best work to those venues likely to achieve highest impact, hence the draw of CHI. However, I have hardly ever published in CHI and I think only once in TOCHI, yet, according to Microsoft Research, I am currently the most highly cited HCI researcher over the last 5 years … So you don’t have to publish in CHI to get impact!

And incidentally, the vfridge paper had NOT been submitted to CHI, but was specially written for BHCI as it seemed the fitting place to discuss a thoroughly British product 🙂

vfridge paper:

Anatomy of an Early Social Networking Site

Nottingham MRL

I was at Mixed Reality Lab in Nottingham for Joel Fischer‘s PhD viva and while there did a seminar the afternoon on “extended episodic experience” based on Haliyana Khalid‘s PhD work and ideas that arose from it. Basically, whereas ‘user experience’ has become a big issue most of the work is focused on individual ‘experiences’ whereas much of life consists of ongoing series of experiences (episodes) which together make up the whole experience of interacting with a person or place, following a band, etc.

I had obviously not done a good enough job at wearing Joel down with difficult questions in the PhD viva in the morning as he was there in the afternoon to ask difficult questions back of his own 😉

Docfest – Digital Economy Summer School

The last major event was Docfest, which brought together the PhD students from the digital economy centres from around the country. Not sure of the exact count but just short of 150 participants I think. They come from a wide variety of backgrounds, business, design, computing, engineering, and many are mature students with years of professional experience behind them.

This looked like being a super event, unfortunately I was only able to attend for a day 🙁 However, I had a great evening at the welcome event talking with many of the students and even got to ride in Steve Forshaw‘s Sinclair C5!

My contribution to the event was running the first morning session on ‘creativity’. Surprise, surprise this started with a bad ideas session, but new for me too as the largest group I’ve run in the past has been around 30. There were a number of local Highwire students acting as facilitators for the groups, so I had only to set them off and observe results :-). At the end of the morning I gave some the theoretical background to bad ideas as a method and in understanding (aspects of) creativity more widely.

Other speakers at the event included Jane Prophet, Chris Csikszentmihalyi and Chris Bonnington, so was sad to miss them; although I did get a fascinating chat with Jane over breakfast in the hotel hearing about her new projects on arts and neural imaging, and on how repetitious writing induces temporary psychosis … That is why the teachers give lines, to send the pupils bonkers!

The idea that there are fundamental cognitive limits on social groups with different sized circles family~6, extended family~20, village~60, large village~200[back]
I had published previously in agricultural engineering.[back]

hyperreal tactile iPad keyboard

Posted on July 20, 2011 by alan

The soft keyboards on iPhone and iPad are surprisingly good. On the iPad I am finding my 3 finger typing almost as fast as on a physical keyboard … although a little more error prone as I keep mis-hitting the space bar and getting ‘n’ or ‘b’ instead (as inmthis shortbphrasemi am typing now), which the poor spell corrector can’t manage at all.

Of course if ou are a touch typist things may be different.

I was especially struck by the image of the keyboard on the iPad, which on the ‘F’ and ‘J’ keys has an image of the tiny tactile bar that allows easy finger positioning by touch alone. However, this is of course a flat screen and I am feeling I am in some sort of surreal world.

F key on iPad showing 'tactile' image

Are five users enough?

Posted on June 4, 2011 by alan

[An edited version of this post is reproduced at HCIbook online!]

I recently got a Facebook message from a researcher about to do a study who asked, “Do you think 5 (users) is enough?”

Along with Fitts’ Law and Miller’s 7+/-2 this “five is enough” must be among the most widely cited, and most widely misunderstood and misused aphorisms of HCI and usability. Indeed, I feel that this post belongs more in ‘Myth Busters” than in my blog.

So, do I think five is enough? Of course, the answer is (annoyingly), “it depends”, sometimes, yes, five is enough, but sometimes, fewer: one appropriate user, or maybe no users at all, and sometimes you need more, maybe many more: 20, 100, 1000. But even when five is enough, the reasons are usually not directly related to Nielsen and Landauer’s original paper, which people often cite (although rarely read) and Nielsen’s “Why You Only Need to Test With 5 Users” alert box (probably more often actually read … well at least the title).

The “it depends” is partly dependent on the particular circumstances of the situation (types of people involved, kind of phenomenon, etc.), and partly on the kind of question you want to ask. The latter is the most critical issue, as if you don’t know what you want to know, how can you know how many users you need?

There are several sorts of reasons for doing some sort of user study/experiment, several of which may apply:

1. To improve a user interface (formative evaluation)

2. To assess whether it is good enough (summative evaluation)

3. To answer some quantitative question such as “what % of users are able to successfully complete this task”

4. To verify or refute some hypothesis such as “does eye gaze follow Fitts’ Law”

5. To perform a broad qualitative investigation of an area

6. To explore some domain or the use of a product in order to gain insight

It is common to see HCI researchers very confused about these distinctions, and effectively perform formative/summative evaluation in research papers (1 or 2) where in fact one of 3-6 is what is really needed.

I’ll look at each in turn, but first to note that, to the extent there is empirical evidence for ‘five is enough”, it applies to the first of these only.

I dealt with this briefly in my paper “Human–Computer Interaction: a stable discipline, a nascent science, and the growth of the long tail” in the John Long Festschrift edition of IwC, and quote here:

In the case of the figure of five users, this was developed based on a combination of a mathematical model and empirical results (Nielsen and Landauer 1993). The figure of five users is:

(i) about the optimal cost/benefit point within an iterative development cycle, considerably more users are required for summative evaluation or where there is only the opportunity for a single formative evaluation stage;

(ii) an average over a number of projects and needs to be assessed on a system by system basis; and

(iii) based on a number of assumptions, in particular, independence of faults, that are more reasonable for more fully developed systems than for early prototypes, where one fault may mask another.

We’ll look at this in more detail below, but critically, the number ‘5’ is not a panacea, even for formative evaluation.

As important as the kind of question you are asking, are the kind of users you are using. So much of modern psychology is effectively the psychology of first year psychology undergraduates (in the in 1950s it was male prisoners). Is this representative? Does this matter? I’ll return to this at the end, but first of all look briefly at each kind of question.

Finally, there is perhaps the real question “will the reviewers of my work think five users is enough” — good publications vs. good science. The answer is that they will certainly be as influenced by the Myth of Five Users as you are, so do good science … but be prepared to need to educate your reviewers too!

formative evaluation – prototyping cycle

As noted formative evaluation was the scope of Nielsen and Landauer’s early work in 1993 that was then cited by Nielsen in his Alert Box in 2000, and which has now developed mythic status in the field.

The 1993 paper was assuming a context of iterative development where there would be many iterations, and asking how many users should be used per iteration, that is how many users should you test before fixing the problems found by those users, and then performing another cycle of user testing, and another. That is, in all cases they considered, the total number of users involved would be far more than five, it is just the number used in each iteration that was lower.

In order to calculate the optimal number of subjects to use per iteration, they looked at:

(i) the cost of performing a user evaluation

(ii) the number of new faults found (additional users will see many of the same faults, so there are diminishing returns)

(iii) the cost of a redevelopment cycle

All of these differ depending on the kind of project, so Nielsen and Landauer looked at a range of projects of differing levels of complexity. By putting them together, and combining with simple probabilistic models of bug finding in software, you can calculate an optimal number of users per experiment.

They found that, depending on the project, the statistics and costs varied and hence the optimal number of users/evaluators (between 7 and 21), with, on the whole, more complex projects (with more different kinds of bugs and more costly redevelopment cycles) having a higher optimal number than simpler projects. In fact all the numbers are larger than five, but five was the number in Nielsen’s earlier discount engineering paper, so the paper did some additional calculations that yielded a different kind of (lower) optimum (3.2 users — pity the last 0.2 user), with five somewhere between 7 and 3 … and a myth was born!

Today, with Web 2.0 and ‘perpetual beta’, agile methods and continuous deployment reduce redevelopment costs to near zero, and so Twidale and Marty argue for ‘extreme evaluation‘ where often one user may be enough (see also my IwC paper).

The number also varies through the development process; early on, one user (indeed using it yourself) will find many, many faults that need to be fixed. Later faults become more obscure, or maybe only surface after long-term use.

Of course, if you use some sort of expert or heuristic evaluation, then the answer may be no real users at all!

And anyway all of this is about ‘fault finding’, usability is not about bug fixing but making things better, it is not at all clear how useful, if at all, literature on bug fixing is for creating positive experiences.

summative evaluation – is it good enough to release

If you are faced with a product and want to ask “is it good enough?” (which can mean, “are there any usability ‘faults’?”, or, “do people want to use it?”), then five users is almost certainly not enough. To give yourself any confidence of coverage of the kinds of users and kinds of use situations, you may need tens or hundreds of users, very like hypothesis testing (below).

However, the answer here may also be zero users. If the end product is the result of a continuous evaluation process with one, five or some other number of users per iteration, then the number of users who have seen the product during this process may be sufficient, especially if you are effectively iterating towards a steady state where few or no new faults are found per iteration.

In fact, even when there has been a continuous process, the need for long-term evaluation becomes more critical as the development progresses, and maybe the distinction between summative and late-stage formative is moot.

But in the end there is only one user you need to satisfy — the CEO … ask Apple.

quantitative questions and hypothesis testing

(Note, there are real numbers here, but if you are a numerophobe never fear, the next part will go back to qualitative issues, so bear with it!)

Most researchers know that “five is enough” does not apply in experimental or quantitative studies … but that doesn’t always stop them quoting numbers back!

Happily in dealing with more quantitative questions or precise yes/no ones, we can look to some fairly solid statistical rules for the appropriate number of users for assessing different kinds of effects (but do note “the right kind of users” below). And yes, very, very occasionally five may be enough!

Let’s imagine that our hypothesis is that a behaviour will occur in 50% of users doing an experiment. With five users, the probability that we will see this behaviour in at least one user is 1 in 32, which is approximately 3%. That is if we do not observe the behaviour at all, then we have a statistically significant result at 5% level (p<0.05) and can reject the hypothesis.

Note that there is a crucial difference between a phenomenon that we expect to see in about 50% of user iterations (i.e. the same user will do it about 50% of the time) and one where we expect 50% of people to do it all of the time. The former we can deal with using a small number of users and maybe longer or repeated experiments, the latter needs more users.

If instead, we wanted to show that a behaviour happens less than 25% of the time, then we need at least 11 users, for 10% 29 users. On the other hand, if we hypothesised that a behaviour happens 90% of the time and didn’t see it in just two users we can reject the hypothesis at significance level of 1%. In the extreme if our hypothesis is that something never happens and we see it with just one user, or if the hypothesis is that it always happens and we fail to see it with one user, in both cases we can reject our hypothesis.

The above only pertains when you see samples where either none or all of the users do something. More often we are trying to assess some number. Rather than “does this behaviour occur 50% of the time”, we are asking “how often does this behaviour occur”.

Imagine we have 100 users (a lot more than five!), and notice that 60% do one thing and 40% do the opposite. Can we conclude that in general the first thing is more prevalent? The answer is yes, but only just. Where something is a simple yes/no or either/or choice and we have counted the replies, we have a binomial distribution. If we have n (100) users and the probability of them answering ‘yes’ is p (50% if there is no real preference), then the maths says that the average number of times we expect to see a ‘yes’ response is n x p = 100 x 0.5 = 50 people — fairly obvious. It also says that the standard deviation of this count is sqrt(n x p x (1-p ) ) = sqrt(25) = 5. As a rule of thumb if answers differ by more than 2 standard deviations from the expected value, then this is statistically significant; so 60 ‘yes’ answers vs. the expected 50 is significant at 5%, but 55 would have just been ‘in the noise’.

Now drop this down to 10 users and imagine you have 7 ‘yes’s and 3 ‘no’s. For these users, in this experiment, they answer ‘yes’ more than twice as often as ‘no’, but here this difference is still no more than we might expect by chance. You need at least 8 to 2 before you can say anything more. For five users even 4 to 1 is quite normal (try tossing five coins and see how many come up heads); only if all or none do something can you start to think you are onto something!

For more complex kinds of questions such as “how fast”, rather than “how often”, the statistics becomes a little more complex, and typically more users are needed to gain any level of confidence.

As a rule of thumb some psychologists talk of 20 users per condition, so if you are comparing 4 things then you need 80 users. However, this is just a rule of thumb and some phenomena have very high variability (e.g. problem solving) whereas others (such as low-level motor actions) are more repeatable for an individual and have more consistency between users. For phenomena with very high variability even 20 users per condition may be too few, although within subjects designs may help if possible. Pilot experiments or previous experiments concerning the same phenomenon are important, but this is probably the time to consult a statistician who can assess the statistical ‘power’ of a suggested design (the likelihood that it will reveal the issue of interest).

qualitative study

Here none of the above applies and … so … well … hmm how do you decide how many users? Often people rely on ‘professional judgement’, which is a posh way of saying “finger in the air”.

In fact, some of the numerical arguments above do still apply (sorry numerophobes). If as part of your qualitative study you are interested in a behaviour that you believe happens about half the time, then with five users you would be very unlucky not to observe it (3% of the time). Or put it another way, if you observe five users you will see around 97% of behaviours that at least half of all users have (with loads and loads of assumptions!).

If you are interested in rarer phenomena, then you need either lots more users (for behaviour that you only see in 1 in 10 users, then you have only a 40% chance of observing it with 5 users, and perhaps more surprisingly, only 65% chance of seeing it with 10 users).

However, if you are interested in a particular phenomenon, then randomly choosing people is not the way to go anyway, you are obviously going to select people who you feel are most likely to exhibit it; the aim is not to assess its prevalence in the world, but to find a few and see what happens.

Crucially when you generalise from qualitative results you do it differently.

Now in fact you will see many qualitative papers that add caveats to say “our results only apply to the group studied …”. This may be necessary to satisfy certain reviewers, but is at best disingenuous – if you really believe the results of your qualitative work do not generalise at all, then why are you publishing it – telling me things that I cannot use?

In fact, we do generalise from qualitative work, with care, noting the particular limitations of the groups studied, but still assume that the results are of use beyond the five, ten or one hundred people that we observed. However, we do not generalise through statistics, or from the raw data, but through reasoning that certain phenomena, even if only observed once, are likely to be ones that will be seen again, even if differing in details. We always generalise from our heads, not from data.

Whether it is one, five or more, by its nature deep qualitative data will involve fewer users than more shallow methods such as large scale experiments or surveys. I often find that the value of this kind of deep interpretative data is enhanced by seeing it alongside large-scale shallow data. For example, if survey or log data reveals that 70% of users have a particular problem and you observe two users having the same problem, then it is not unreasonable to assume that the reasons for the problem are similar to those of the large sample — yes you can generalise from two!

Indeed one user may be sufficient (as often happens with medical case histories, or business case studies), but often it is about getting enough users so that interesting things turn up.

exploratory study

This looking for interesting things is often the purpose of research: finding a problem to tackle. Once we have found an interesting issue, we may address it in many ways: formal experiments, design solutions, qualitative studies; but none of these are possible without something interesting to look at.

In such situations, as we saw with qualitative studies in general, the sole criteria for “is N enough” is whether you have learnt something.

If you want to see all, or most of the common phenomena, then you need lots of users. However, if you just want to find one interesting one, then you only need as many as gets you there. Furthermore whilst you often choose ‘representative or ‘typical’ users (what is a typical user!) for most kinds of study and evaluation, for exploratory analysis, often extreme users are most insightful; of course you have to work out whether your user or users are so unusual that the things you observe are unique to them … but again real research comes from the head, you have to think about it and make an assessment.

In the IwC paper I discuss some of the issues of single person studies in more detail and Fariza Razak’s thesis is all about this.

the right kind of users

If you have five, fifty or five hundred users, but they are all psychology undergraduates, they are not going to tell you much about usage by elderly users, or by young unemployed people who have left school without qualifications.

Again the results of research ultimately come from the head not the data: you will never get a complete typical, or representative sample of users; the crucial thing is to understand the nature of the users you are studying, and to make an assessment of whether the effects you see in them are relevant, and interesting more widely. If you are measuring reaction times, then education may not be a significant factor, but Game Boy use may be.

Many years ago I was approached by a political science PhD student. He had survey data from over 200 people (not just five!), and wanted to know how to calculate error bars to go on his graphs. This was easily done and I explained the procedure (a more systematic version of the short account given earlier). However, I was more interested in the selection of those 200 people. They were Members of Parliament; he had sent the survey to every MP (all 650 of them) and got over 200 replies, a 30% return rate, which is excellent for any survey. However, this was a self-selected group and so I was more interested in whether the grounds for self-selection influenced the answers than in how many of them there were. It is often the case that those with strong views on a topic are more likely to answer surveys on it. The procedure he had used was as good as possible, but, in order to be able to make any sort of statement about the interpretation of the data, he needed to make a judgement. Yet again knowledge is ultimately from the head not the data.

For small numbers of users these choices are far more critical. Do you try and choose a number of similar people, so you can contrast them, or very different so that you get a spread? There is no right answer, but if you imagine having done the study and interpreting the results this can often help you to see the best choice for your circumstances.

being practical

In reality whether choosing how many, or who, to study, we are driven by availability. It is nice to imagine that we make objective selections based on some optimal criteria — but life is not like that. In reality, the number and kind of users we study is determined by the number and kind of users we can recruit. The key thing is to understand the implications of these ‘choices’ and use these in your interpretation.

As a reviewer I would prefer honesty here, to know how and why users were selected so that I can assess the impact of this on the results. But that is a counsel of perfection, and again good science and getting published are not the same thing! Happily there are lovely euphemisms such as ‘convenience sample’ (who I could find) and ‘snowball sample’ (friends of friends, and friends of friends of friends), which allow honesty without insulting reviewers’ academic sensibilities.

in the end

Is five users enough? It depends: one, five, fifty or one thousand (Google test live with millions!). Think about what you want out of the study: numbers, ideas, faults to fix, and the confidence and coverage of issues you are interested in, and let that determine the number.

And, if I’ve not said it enough already, in the end good research comes from your head, from thinking and understanding the users, the questions you are posing, not from the fact that you had five users.

references

A. Dix (2010) Human-Computer Interaction: a stable discipline, a nascent science, and the growth of the long tail. Interacting with Computers, 22(1) pp. 13-27. http://www.hcibook.com/alan/papers/IwC-LongFsch-HCI-2010/

Nielsen, J. (1989). Usability engineering at a discount. In Salvendy, G., and Smith, M.J. (Eds.), Designing and Using Human–Computer Interfaces and Knowledge Based Systems, Elsevier Science Publishers, Amsterdam. 394-401.

Nielsen, J. and Landauer, T. K. 1993. A mathematical model of the finding of usability problems. In Proceedings of the INTERACT ’93 and CHI ’93 Conference on Human Factors in Computing Systems (Amsterdam, The Netherlands, April 24 ? 29, 1993). CHI ’93. ACM, New York, NY, 206?213. http://doi.acm.org/10.1145/169059.169166

Jakob Nielsen’s Alertbox, March 19, 2000: Why You Only Need to Test With 5 Users. http://www.useit.com/alertbox/20000319.html

Fariza Razak (2008). Single Person Study: Methodological Issues. PhD Thesis. Computing Department, Lancaster University, UK. February 2008. http://www.hcibook.net/people/Fariza/

Michael Twidale and Paul Marty (2004-) Extreme Evaluation. http://people.lis.uiuc.edu/~twidale/research/xe/

the real tragedy of the commons

Posted on March 6, 2011 by alan

I’ve just been reviewing a paper that mentions the “tragedy of the commons”¹ and whenever I read or hear the phrase I feel the hackles on the back of my neck rise.

Of course the real tragedy of the commons was not free-riding and depletion by common use, but the rape of the land under mass eviction or enclosure movements when they ceased to be commons. The real tragedy of “the tragedy of the commons” as a catch phrase is that it is often used to promote the very same practices of centralisation. Where common land has survived today, just as in the time before enclosures and clearances, it is still managed in a collaborative way both for the people now and the for the sake of future generations. Indeed on Tiree, where I live, there are large tracts of common grazing land managed in just such a way.

It is good to see that the Wikipedia article of “Tragedy of the Commons” does give a rounded view on the topic including reference to an historical and political critique by “Ian Angus”²

The paper I was reading was not alone in uncritically using the phrase. Indeed in “A Framework for Web Science”³ we read:

In a decentralised and growing Web, where there are no “owners” as such, can we be sure that decisions that make sense for an individual do not damage the interests of users as a whole? Such a situation, known as the ‘tragedy of the commons’, happens in many social systems that eschew property rights and centralised institutions once the number of users becomes too large to coordinate using peer pressure and moral principles.

In fact I do have some sympathy with this as the web involves a vast number of physically dispersed users who are perhaps “too large to coordinate using peer pressure and moral principles”. However, what is strange is that the web has raised so many modern counter examples to the tragedy of the commons, not least Wikipedia itself. In many open source projects people work as effectively a form of gift economy, where, if there is any reward, it is in the form of community or individual respect.

Clearly, there are examples in the world today where many individual decisions (often for short term gain) lead to larger scale collective loss. This is most clearly evident in the environment, but also the recent banking crisis, which was fuelled by the desire for large mortgages and general debt-led lives. However, these are exactly the opposite of the values surrounding traditional common goods.

It may be that the problem is not so much that large numbers of people dilute social and moral pressure, but that the impact of our actions becomes too diffuse to be able to appreciate when we make our individual life choices. The counter-culture of many parts of the web may reflect, in part, the way in which aspects of the web can make the impact of small individual actions more clear to the individual and more accountable to others.

Garrett Hardin, “The Tragedy of the Commons”, Science, Vol. 162, No. 3859 (December 13, 1968), pp. 1243-1248. … and here is the danger of citation counting as a quality metric, I am citing it because I disagree with it![back]
Ian Angus. The Myth of the Tragedy of the Commons. Socialist Voice, August 24, 2008[back]
Berners-Lee, T., Hall, W., Hendler, J. A., O’Hara, K., Shadbolt, N. and Weitzner, D. J. (2006) A Framework for Web Science. Foundations and Trends in Web Science, 1 (1). pp. 1-130. http://eprints.ecs.soton.ac.uk/13347/[back]

Alan Dix

Category Archives: web development