books: The Nature of Technology (Arthur) and The Evolution of Technology (Basalla)

Posted on March 16, 2012 by alan

I have just finished reading “The Nature of Technology” (NoT) by W. Brian Arthur and some time ago read “The Evolution of Technology” (EoT) by George Basalla, both covering a similar topic, the way technology has developed from the earliest technology (stone axes and the wheel), to current digital technology. Indeed, I’m sure Arthur would have liked to call his book “The Evolution of Technology” if Basalla had not already taken that title!

We all live in a world dominated by technology and so the issue of how technology develops is critical to us all. Does technology ultimately serve human needs or does it have its own dynamics independent of us except maybe as cogs in its wheels? Is the arc of technology inevitable or does human creativity and invention drive it in new directions? Is the development of technology now similar (albeit a bit faster) than previous generations, or does digital technology fundamentally alter things?

Basalla was published in 1988, while Arthur is 2009, so Arthur has 20 years more to work on, not much compared to 2 million years for the stone axe and 5000 years for the wheel, but 20 years that has included the dot.com boom (and bust!), and the growth of the internet. In a footnote (NoT,p.17), Arthur describes Basalla as “the most complete theory to date“, although then does not appear to directly reference Basalla again in the text – maybe because they have different styles. Basalla (a historian of technology) offering a more descriptive narrative whilst Arthur (and engineer and economist) seeks a more analytically complete account. However I also suspect that Arthur discovered Basella’s work late and included a ‘token’ reference; he says that a “theory of technology — an “ology” of technology” is missing (NoT,p.14), but, however partial, Basella’s account cannot be seen as other than part of such a theory.

Both authors draw heavily, both explicitly and implicitly, on Darwinian analogies, but both also emphasise the differences between biological and technological evolution. Neither is happy with, what Basella calls the “heroic theory of invention” where “inventions emerge in a fully developed state from the minds of gifted inventors” (EoT,p.20). In both there are numerous case studies which refute these more ‘heroic’ accounts, for example Watts’ invention of the steam engine after seeing a kettle lid rattling on the fire, and show how these are always built on earlier technologies and knowledge. Arthur is more complete in eschewing explanations that depend on human ingenuity, and therein, to my mind, lies the weakness of his account. However, Arthur does take into account, as central mechanism, the accretion of technological complexity through the assembly of components, al but absent from Basella’s account — indeed in my notes as I read Basella I wrote “B is focused on components in isolation, forgets implication of combinations“.

I’ll describe the main arguments of each book, then look at what a more complete picture might look like.

(Note, very long post!)

Basella: the evolution of technology

Basella’s describes his theory of technological evolution in terms of four concepts:

diversity of artefacts — acknowledging the wide variety both of different kinds of things, but also variations of the same thing — one example, dear to my heart, is his images of different kinds of hammers 🙂
continuity of development — new artefacts are based on existing artefacts with small variations, there is rarely sudden change
novelty — introduced by people and influenced by a wide variety of psychological, social and economic factors … not least playfulness!
selection — winnowing out the less useful/efficient artefacts, and again influenced by a wide variety of human and technological factors

Basella sets himself apart both from earlier historians of technology (Gilfillan and Ogburn) who took an entirely continuous view of development and also the “myths of the heroic inventors” which saw technological change as dominated by discontinuous change.

He is a historian and his accounts of the development of artefacts are detailed and beautifully crafted. He takes great efforts to show how standard stories of heric invention, such as the steam engine, can be seen much more sensibly in terms of slower evolution. In the case of steam, the basic principles had given rise to Newcomen’s steam pump some 60 years prior to Watt’s first steam engine. However, whilst each of these stories emphasised the role of continuity, as I read them I was struck also by the role of human ingenuity. If Newcomen’s engine had been around since 1712 years, what made the development to a new and far more successful form take 60 years to develop? The answer is surely the ingenuity of James Watt. Newton said he saw further only because he stood on the shoulders of giants, and yet is no less a genius for that. Similaly the tales of invention seem to be both ones of continuity, but also often enabled by insights.

In fact, Basella does take this human role on board, building on Usher’s earlier work, which paced insight centrally in accounts of continuous change. This is particularly central in his account of the origins of novelty where he considers a rich set of factors that influence the creation of true novelty. This includes both individual factors such as playfulness and fantasy, and also social/cultural factors such as migration and the patent system. It is interesting however that when he turns to selection, it is lumpen factors that are dominant: economic, military, social and cultural. This brings to mind Margaret Bowden’s H-creativity and also Csikszentmihalyi’s cultural views of creativity — basically something is only truly creative (or maybe innovative) when it is recognised as such by society (discuss!).

Arthur: the nature of technology

Basella ends his book confessing that he is not happy with the account of novelty as provided from historical, psychological and social perspectives. Arthur’s single reference to Basella (endnote, NoT, p.17) picks up precisely this gap, quoting Basella’s “inability to account fully for the emergence of novel artefacts” (EoT,p.210). Arthur seeks to fill this gap in previous work by focusing on the way artefacts are made of components, novelty arising through the hierarchical organisation and reorganisation of these components, ultimately built upon natural phenomena. In language reminiscent of proponents of ‘computational thinking‘, Arthur talks of a technology being the “programming of phenomena for our purposes” (NoT,p.51). Although, not directly on this point, I should particularly liked Arthur’s quotation from Charles Babbage “I wish to God this calculation had been executed by steam” (NoT,p.74), but did wonder whether Arthur’s computational analogy for technology was as constrained by the current digital perspective as Babbage’s was by the age of steam.

Although I’m not entirely convinced at the completeness of hierarchical composition as an explanation, it is certainly a powerful mechanism. Indeed Arthur views this ‘combinatorial evolution’ as the key difference between biological and technological evolution. This assertion of the importance of components is supported by computer simulation studies as well as historical analysis. However, this is not the only key insight in Arthur’s work.

Arthur emphasises the role of what he calls ‘domains’, in his words a “constellation of technologies” forming a “mutually supporting set” (NoT,p.71). These are clusters of technologies/ideas/knowledge that share some common principle, such as ‘radio electronics’ or ‘steam power’. The importance of these are such that he asserts that “design in engineering begins by choosing a domain” and that the “domain forms a language” within which a particular design is an ‘utterance’. However, domains themselves evolve, spawned from existing domains or natural phenomena, maturing, and sometimes dying away (like steam power).

The mutual dependence of technology can lead to these domains suddenly developing very rapidly, and this is one of the key mechanisms to which Arthur attributes more revolutionary change in technology. Positive feedback effects are well studied in cybernetics and is one of the key mechanisms in chaos and catastrophe theory which became popularised in the late 1970s. However, Arthur is rare in fully appreciating the potential for these effects to give rise to sudden and apparently random changes. It is often assumed that evolutionary mechanisms give rise to ‘optimal’ or well-fitted results. In other areas too, you see what I have called the ‘fallacy of optimality’¹; for example, in cognitive psychology it is often assumed that given sufficient practice people will learn to do things ‘optimally’ in terms of mental and physical effort.

human creativity and ingenuity

Arthur’s account is clearly more advanced than the early more gradualists, but I feel that in pursuing the evolution of technology based on its own internal dynamics, he underplays the human element of the story. Arthur even goes so far as to describe technology using Maturna’s term autopoetic (NoT,p.170) — something that is self-(re)producing, self-sustaining … indeed, in some sense with a life of its own.

However, he struggles with the implications of this. If, technology responds to “its own needs” rather than human needs, “instead of fitting itself to the world, fits the world to itself” (NoT,p.214), does that mean we live with, or even within, a Frankenstein’s monster, that cares as little for the individuals of humanity as we do for our individual shedding skin cells? Because of positive feedback effects, technology is not deterministic; however, it is rudderless, cutting its own wake, not ours.

In fact, Arthur ends his book on a positive note:

“Where technology separates us from these (challenge, meaning, purpose, nature) it brings a type of death. But where it affirms these, it affirms life. It affirms our humanness.” (NoT,p.216)

However, there is nothing in his argument to admit any of this hope, it is more a forlorn hope against hope.

Maybe Arthur should have ended his account at its logical end. If we should expect nothing from technology, then maybe it is better to know it. I recall as a ten-year old child wondering just these same things about the arc of history: do individuals matter? Would the Third Reich have grown anyway without Hitler and Britain survived without Churchill? Did I have any place in shaping the world in which I was to live? Many years later as I began to read philosophy, I discovered these were questions that had been asked before, with opposing views, but no definitive empirical answer.

In fact, for technological development, just as for political development, things are probably far more mixed, and reconciling Basella and Arthur’s accounts might suggest that there is space both for Arthur’s hope and human input into technological evolution.

Recall there were two main places where Basella placed human input (individual and special/cultural): novelty and selection.

The crucial role of selection in Darwinian theory is evident in its eponymous role: “Natural Selection”. In Darwinian accounts, this is driven by the breeding success of individuals in their niche, and certainly the internal dynamics of technology (efficiency, reliability, cost effectiveness, etc.) are one aspect of technological selection. However, as Basella describes in greater detail, there are many human aspects to this as well from the multiple individual consumer choices within a free market to government legislation, for example regulating genome research or establishing emissions limits for cars. This suggest a relationship with technology les like that with an independently evolving wild beast and more like that of the farmer artificially selecting the best specimens.

Returning to the issue of novelty. As I’ve noted even Basella seems to underplay human ingenuity in the stories of particular technologies, and Arthur even more so. Arthur attempts account for “the appearance of radically novel technologies” (NoT,p.17) though composition of components.

One example of this is the ‘invention’ of the cyclotron by Ernest Lawrence (Not,p.114). Lawrence knew of two pieces of previous work: (i) Rolf Wideröe’s idea to accelerate particles using AC current down a series of (very) long tubes, and (ii) the fact that magnetic fields can make charged particles swing round in circles. He put the two together and thereby made the cyclotron, AC currents sending particles ever faster round a circular tube. Lawrence’s first cyclotron was just a few feet across; now, in CERN and elsewhere, they are many miles in diameter, but the principle is the same.

Arthur’s take-home message from this is that the cyclotron did not spring ready-formed and whole from Lawrence’s imagination, like Athena from Zeus’ head. Instead, it was the composition of existing parts. However, the way in which these individual concepts or components fitted together was far from obvious. In many of the case studies the component technology or basic natural phenomena had been around and understood for many years before they were linked together. In each case study it seems to be the vital key in putting together the disparate elements is the human one — heroic inventors after all 🙂

Some aspects of this invention not specifically linked to composition: experimentation and trial-and-error, which effectively try out things in the lab rather than in the market place; the inventor’s imagination of fresh possibilities and their likely success, effectively trail-and-error in the head; and certainly the body of knowledge (the domains in Arthur’s terms) on which the inventor can draw.

However, the focus on components and composition does offer additional understanding of how these ‘breakthroughs’ take place. Randomly mixing components is unlikely to yield effective solutions. Human inventors’ understanding of the existing component technologies allows them to spot potentially viable combinations and perhaps even more important their ability to analyse the problems that arise allow them to ‘fix’ the design.

In my own work in creativity I often talk about crocophants, the fact that arbitrarily putting two things together, even if each is good in its own right, is unlikely to lead to a good combination. However, by deeply understanding each, and why they fit their respective environments, one is able to intelligently combine things to create novelty.

Darwinism and technology

Both Arthur and Basalla are looking for modified version of Darwinism to understand technological evolution. For Arthur it is the way in which technology builds upon components with ‘combinatorial evolution’. While pointing to examples in biology he remarks that “the creation of these larger combined structures is rarer in biological evolution — much rarer — than in technological evolution” (NoT,p.188). Strangely, it is precisely the power of sexual reproduction over simpler mutation, that it allows the ‘construction’ and ‘swopping’ of components; this is why artificial evolutionary algorithms often outperform simple mutation (a form of stochastic hill-climbing algorithm, itself usually better than deterministic hill climbing). However, technological component combination is not the same as biological components.

A core ‘problem’ for biological evolution is the complexity of the genotype–phenotype mapping. Indeed in “The Selfish Gene” Dawkins attacks Lamarckism precisely on the grounds that the mapping is impossibly complex hence cannot be inverted². In fact, Dawkins arguments would also ‘disprove’ Darwinian natural selection as it also depends on the mapping not being too complex. If the mapping between genotype–phenotype were as complex as Dawkins suggested, then small changes to genotypes as gene patterns would lead to arbitrary phenotypes and so fitness of parents would not be a predictor of fitness of offspring. In fact while not simple to invert (as is necessary for Lamarckian inheritance) the mapping is simple enough for natural selection to work!

One of the complexities of the genotype–phenotype mapping in biology is that the genotype (our chromosomes) is far simpler (less information) than our phenotype (body shape, abilities etc.). Also the complexity of the production mechanism (a mothers womb) is no more complex than the final product (the baby). In contrast for technology the genotype (plans, specifications, models, sketches), is of comparable complexity to the final product. Furthermore the production means (factory, workshop) is often far more complex than the finished item (but not always, the skilled woodsman can make a huge variety of things using a simple machete, and there is interesting work on self-fabricating machines).

The complexity of the biological mapping is particularly problematic for the kind of combinatorial evolution that Arthur argues is so important for technological development. In the world of technology, the schematic of a component is a component of the schematic of the whole — hierarchies of organisation are largely preserved between phenotype and geneotype. In contrast, genes that code for finger length are also likely to affect to length, and maybe other characteristics as well.

As noted sexual reproduction does help to some extent as chromosome crossovers mean that some combinations of genes tend to be preserved through breeding, so ‘parts’ of the whole can develop and then be passed together to future generations. If genes are on different chromosomes, this process is a bit hit-and-miss, but there is evidence that genes that code for functionally related things (and therefore good to breed together), end up close on the same chromosome, hence more likely to be passed as a unit.

In contrast, there is little hit-and-miss about technological ‘breeding’ if you want component A from machine X and component B from machine Y, you just take the relevant parts of the plans and put them together.

Of course, getting component A and component B to work together is anther matter, typically some sort of adaptation or interfacing is needed. In biological evolution this is extremely problematic, as Arthur says “the structures of genetic evolution” mean that each step “must produce something viable” NoT,p.188). In contrast, the ability to ‘fix’ the details composition in technology means that combinations that are initially not viable, can become so.

However, as noted at the end of the last section, this is due not just to the nature of technology, but also human ingenuity.

The crucial difference between biology and technology is human design.

technological context and infrastructure

A factor that seems to be weak or missing in both Basella and Arthur’s theories, is the role of infrastructure and general technological and environmental context³. This is highlighted by the development of the wheel.

The wheel and fire are often regarded as core human technologies, but whereas the fire is near universal (indeed predates modern humans), the wheel was only developed in some cultures. It has long annoyed me when the fact that South American civilisations did not develop the wheel is seen as some kind of lack or failure of the civilisation. It has always seemed evident that the wheel was not developed everywhere simply because it is not always useful.

I was wonderful therefore to read Basella’s detailed case study of the wheel (EoT,p.7–11) where he backs up what for me had always been a hunch, with hard evidence. I was aware that the Aztecs had wheeled toys even though they never used wheels for transport. Basella quite sensibly points out that this is reasonable given the terrain and the lack of suitable draught animals. He also notes that between 300–700 AD wheels were abandoned in the Near East and North Africa — wheels are great if you have flat hard natural surfaces, or roads, but not so useful on steep broken hillsides, thick forest, or soft sandy deserts.

In some ways these combinations: wheels and roads, trains and rails, electrical goods and electricity generation can be seen as a form of domain in Arthur’s sense, a “mutually supporting set” of technologies (NoT,p.71), indeed he does talk abut the “canal world” (NoT,p82). However, he is clearly thinking more about the component technologies that make up a new artefact, and less about the set of technologies that need to surround new technology it make it viable.

The mutual interdependence of infrastructure and related artefacts forms another positive feedback loop. In fact, in his discussion of ‘lock-in’, Arthur does talk about the importance of “surrounding structures and organisations”, as a constraint often blocking novel technology, and the way some technologies are only possible because of others (e.g. complex financial derivatives only possible because of computation). However, the best example is Basalla’s description of the of the development of the railroad vs. canal in the American Mid-West (EoT,p.195–197). This is often seen as simply the result of the superiority of the railway, but in the 1960s, Robert Fogel, a historian, made a detailed economic comparison and found that there was no clear financial advantage; it is just that once one began to become dominant the positive feedback effects made it the sole winner.

Arthur’s compositional approach focuses particularly on hierarchical composition, but these infrastructures often cut across components: the hydraulics in a plane, electrical system in a car, or Facebook ‘Open Graph’. And of course one of the additional complexities of biology is that we have many such infrastructure systems in our own bodies blood stream, nervous system, food and waste management.

It is interesting that the growth of the web was possible by a technological context of the existing internet and home PC sales (which initially were not about internet use, even though now this is often the major reason for buying computational devices). However, maybe the key technological context for the modern web is the credit card, it is online payments and shopping, or the potential for them, that has financed the spectacular growth of the area. There would be no web without Berners Lee, but equally without Barclay Card.

see my WebSci’11 paper for more on the ‘fallacy of optimality’[back]
Why Dawkins chose to make such an attack on Lamarckism I’ve never understood, as no-one had believed in it as an explanation for nearly 100 years. Strangely, it was very soon after “The Selfish Gene” was published that examples of Lamarckian evolution were discovered in simple organisms, and recently in higher animals, although in the latter through epigenetic (non-DNA) means.[back]
Basalla does describes the importance of “environmental influences”, but is referring principally to the natural envronment.[back]

tread lightly — controlling user experience pollution

Posted on January 14, 2012 by alan

When thinking about usability or user experience, it is easy to focus on the application in front of us, but the way it impacts its environment may sometimes be far more critical. However, designing applications that are friendly to their environment (digital and physical) may require deep changes to the low-level operating systems.

I’m writing this post effectively ‘offline’ into a word processor for later upload. I sometimes do this as I find it easier to write without the distractions of editing within a web browser, or because I am physically disconnected from the Internet. However, now I am connected, and indeed I can see I am connected as a FTP file upload is progressing, it is just that anything else network-related is stalled.

The reason that the FTP upload is ‘hogging’ the network is, I believe, due to a quirk in the UNIX scheduling system, which was, paradoxically, originally intended to improve interactivity.

UNIX, which sits underneath Mac OS, is a multiprocessing operating system running many programs at once. Each process has a priority, called its ‘niceness‘, which can be set explicitly, but is also tweaked from moment to moment by the operating system. One of the rules for ‘tweaking’ it is that if a process is IO-bound, that is if it is constantly waiting for input or output, then its niceness is decreased, meaning that it is given higher priority.

The reason for this rule is partly to enhance interactive performance in the old days of command line interfaces; an interactive program would spend lots of time waiting for the user to enter something, and so its priority would increase meaning it would respond quickly as soon as the user entered anything. The other reason is that CPU time was seen as the scarce resource, so that processes that were IO bound were effectively being ‘nicer’ to other processes as they let them get a share of the precious CPU.

The FTP program is simply sitting there shunting out data to the network, so is almost permanently blocked waiting for the network as it can read from the disk faster than the network can transmit data. This means UNIX regards it as ‘nice’ and ups its priority. As soon as the network clears sufficiently, the FTP program is rescheduled and it puts more into the network queue, reads the next chunk from disk until the network is again full to capacity. Nothing else gets a chance, no web, no email, not even a network trace utility.

I’ve seen the same before with a database server on one of Fiona’s machines — all my fault. In the MySQL manual it suggested that you disable indices before large bulk updates (e.g. ingesting a file of data) and then re-enable them once the update is finished as indexing is more efficient on lots of data than one at a time. I duly did this and forgot about it until Fiona noticed something was wrong on the server and web traffic had ground to a near halt. When she opened a console on the server, she found that it seemed quiet, very little CPU load at all, and was puzzled until I realised it was my indexing. Indexing requires a lot of reading and writing data to and from disk, so MySQL became IO-bound, was given higher priority, as soon as the disk was free it was rescheduled, hit the disk once more … just as FTP is now hogging the network, MySQL hogged the disk and nothing else could read or write. Of course MySQL’s own performance was fine as it internally interleaved queries with indexing, it is just everything else on the system that failed.

These are hard scenarios to design for. I have written before (“why software need never hang“) about the way application designers do not think sufficiently about potential delays due to slow networks, or broken connections. However, that was about the applications that are suffering. Here the issue is not that the FTP program is badly designed for its delays, it is still responding very happily, just that it has had a knock on effect on the rest of the system. It is like cleaning your sink with industrial bleach — you have a clean house within, but pollute the watercourse without.

These kind of issues are not related solely to network and disk, any kind of resource is limited and profligacy causes damage in the digital world as much as in the physical environment.

Some years ago I had a Symbian smartphone, but it proved unusable as its battery life rarely exceeded 40 minutes from full charge. I thought I had a duff battery, but later realised it was because I was leaving applications on the phone ‘open’. For me I went to the address book, looked up a number, and that was that, I then maybe turned the phone off or switched to something else without ‘exiting’ the address book. I was treating the phone like every previous phone I had used, but this one was different, it had a ‘real’ operating system, opening the address book launched the address book application, which then kept on running — and using power — until it was explicitly closed, a model that is maybe fine for permanently plugged in computers, but disastrous for a moble phone.

When early iPhones came out iOS was criticised for being single threaded, that is not having lots of things running in the ‘background’. However, this undoubtedly helped its battery life. Now, with newer versions of iOS, it has changed and there are lots of apps running at once, and I have noticed the battery life reducing, is that simply the battery wearing out with age or the effect of all those apps running?

Power is of course not just a problem for smartphones, but for any laptop. I try to closedown applications on my Mac when I am working without power as I know some programs just eat CPU when they are apparently idle (yes, Firefox, it’s you I’m talking about). And from an environmental point of view, lower power consumption when connected would also be good. My hope was that Apple would take the lessons learnt in the early iOS to change the nature of their mainstream OS, but sadly they succumbed to the pressure to make iOS a ‘proper’ OS!

Of course the FTP program could try to be friendly, perhaps when it is not the selected window deliberately throttle its network activity. But then the 4 hour upload would take 8 hours, instead of 20 minutes left at this point, I’d be looking forward to another 4 hours and 20 minutes, and I’d be complaining about that.

The trouble is that there needs to be better communication, more knowledge shared, between application and operating system. I would like FTP to use all the network capacity that it can, except when I am interacting with some other program. Either FTP needs to say to the OS “hey here’s a packet, send it when there’s a gap”¹, or the OS needs some way for applications to determine current network state and make decisions based on that. Sometimes this sort of information is easily available, more often it is either very hard to get at or not available at all.

I recall years ago when internet was still mainly through pay-per-minute dial-up connections. You could set your PC to automatically dial when the internet was needed. However, some programs, such as chat, would periodically check with a central server to see if there was activity, this would cause the PC to dial-up the ISP. If you were lucky the PC also had an auto-disconnect after a period of inactivity, if you were not lucky the PC would connect at 2am and by the morning you’d find yourself with a phone bill more than your weeks’ wages.

When we were designing onCue at aQtive, we wanted to be able to connect to the Internet when it was available, but avoid bankrupting our users. Clearly somewhere in the TCP/IP stack, the layers of code over the network, at some level deep down it knew whether we were connected. I recall we found a very helpful function in the Windows API called something like “isConnected”². Unfortunately, it worked by attempting to send a network packet and returning true if it succeeded and false if it failed. Of course sending the test packet caused the PC to auto-dial …

And now there is just 1 minute and 53 seconds left on the upload, so time to finish this post before I get on to garbage collection.

This form of “send when you can” would also be useful in cellular networks, for example when syncing photos.[back]
I had a quick peek, and fund that Windows CE has a function called InternetGetConnectedState. I don’t know if this works better now.[back]

New Year and New Job

Posted on January 5, 2012 by alan

It is a New Year and I am late with my Christmas crackers again!

If you are expecting the annual virtual cracker from me it is coming … but maybe not before Twelfth Night :-/

The New Year is bringing changes, not least, as many already know, I am moving my academic role and taking up a part-time post as professor down in Birmingham University.

At Birmingham I will be joining an established and vibrant HCI centre, including long-term colleague and friend Russell Beale. The group has recently had substantial investment from the University leading to several new appointments including Andrew Howes (who coincidentally also has past Lancaster connections).

The reasons for the move are partly to join this exciting group and partly to simplify life as Talis is based in Birmingham, so just one place to travel to regularly, and one of my daughters also there.

Of course this also means I will be leaving many dear colleagues and friends at Lancaster, but I do expect to continue to work with many and am likely to retain a formal or informal role there for some time.

As well as moving institutions I am also further reducing my percentage of academic time — typically I’ll be just one day a week academic. So, apologies in advance if my email responses becomes even more sporadic and I turn down (or fail to answer :-() requests for reviews, PhD exams, etc.

Although moving institutions, I will, of course, continue to live up in Tiree (wild and windy, but, at the moment, so is everywhere!), so will still be travelling up and down the country; I’ll wave as I pass!

… and there will be another Tiree Tech Wave in March 🙂

book: The Unfolding of Language, Deutscher

Posted on September 8, 2011 by alan

I have previously read Guy Deutscher‘s “Through the Language Glass“, and have now, topsy turvy, read his earlier book “The Unfolding of Language“. Both are about language, “The Unfolding of Language” about the development of the complexity of language that we see today from simpler origins, and “Through the Language Glass” about the interaction between language and thought. Both are full of sometimes witty and always fascinating examples drawn from languages around the world, from the Matses in the Amazon to Ancient Sumarian.

I recall my own interest in the origins of language began young, as a seven year old over breakfast one day, asking whether ‘night, was a contraction of ‘no light’. While this was an etymological red herring, it is very much the kind of change that Deutscher documents in detail showing the way a word accretes beginnings and ending through juxtaposition of simpler words followed by erosion of hard to pronounce sounds.

One of my favourites examples was the French “aujourd’hui”. The word ‘hui, was Old French for ‘today’, but was originally Latin “hoc die”, “(on) this day”. Because ‘hui’ is not very emphatic it became “au jour d’hui”, “on the day of this day” , which contracted to the current ‘aujourd’hui’. Except now to add emphasis some French speakers are starting to say “au jour aujourd’hui”, “on the day on the day of this day”! This reminds me of Longsleddale in the Lake District (inspiration for Postman Pat‘s Greendale), a contraction of “long sled dale”, which literally means “long valley valley” from Old English “slaed” meaning “valley” … although I once even saw something suggesting that ‘long’ itself in the name was also “valley” in a different language!

Deutscher gives many more prosaic examples where words meaning ‘I’, ‘you’, ‘she’ get accreted to verbs to create the verb endings found in languages such as French, and how prepositions (themselves metaphorically derived from words like ‘back’) were merged with nouns to create the complex case endings of Latin.

However, the most complex edifice, which Deutscher returns to repeatedly, is that of the Semitic languages with a template system of vowels around three-consonant roots, where the vowel templates change the meaning of the root. To illustrate he uses the (fictional!) root ‘sng’ meaning ‘to snog’ and discusses how first simple templates such as ‘snug’ (“I snogged”) and then more complex constructions such as ‘hitsunnag’ (“he was made to snog himself”) all arose from simple processes of combination, shortening and generalisation.

“The Unfolding of Language” begins with the 19th century observation that all languages seem to be in a process of degeneration where more complex forms such as the Latin case system or early English verb endings are progressively simplified and reduced. The linguists of the day saw all languages in a state of continuous decay from an early linguistic Golden Age. Indeed one linguist, August Schleicher, suggested that there was a process where language develops until it is complex enough to get things done, and only then recorded history starts, after which the effort spent on language is instead spent in making history.

As with geology, or biological evolution, the modern linguist rejects this staged view of the past, looking towards the Law of Uniformitarianism, things are as they have always been, so one can work out what must have happened in the pre-recorded past by what is happening now. However, whilst generally finding this convincing, throughout the book I had a niggling feeling that there is a difference. By definition, those languages for which we have written records are those of large developed civilisations, who moreover are based on writing. Furthermore I am aware that for biological evolution small isolated groups (e.g. on islands or cut off in valleys) are particularly important for introducing novelty into larger populations, and I assume the same would be true of languages, but somewhat stultified by mass communication.

Deutscher does deal with this briefly, but right at the very end in a short epilogue. I feel there is a whole additional story about the interaction between culture and the grammatical development of language. I recall in school a teacher explained how in Latin the feminine words tended to belong to the early period linked to agriculture and the land, masculine words for later interests in war and conquest, and neuter for the still later phase of civic and political development. There were many exceptions, but even this modicum of order helped me to make sense of what otherwise seemed an arbitrary distinction.

The epilogue also mentions that the sole exception to the ‘decline’ in linguistic complexity is Arabic with its complex template system, still preserved today.

While reading the chapters about the three letter roots, I was struck by the fact that both Hebrew an Arabic are written as consonants only with vowels interpolated by diacritical marks or simply remembered convention (although Deutscher does not mention this himself). I had always assumed that this was like English where t’s pssble t rd txt wth n vwls t ll. However, the vowels are far more critical for Semitic languages where the vowel-less words could make the difference between “he did it” and “it will be done to him”. Did this difference in writing stem from the root+template system, or vice versa, or maybe they simply mutually reinforced each other?

The other factor regarding Arabic’s remarkable complexity must surely be the Quran. Whereas the Bible was read for a over a millennium in Latin, a non-spoken language, and later translated focused on the meaning; in contrast there is a great emphasis on the precise form of the Quran together with continuous lengthy recitation. As the King James Bible has been argued to have been a significant influence on modern English since the 17th century, it seems likely the Quran has been a factor in preserving Arabic for the last 1500 years.

Early in “The Unfolding of Language” Deutscher dismisses attempts to look at the even earlier prehistoric roots of language as there is no direct evidence. I assume that this would include Mithin’s “The Singing Neanderthals“, which I posted about recently. There is of course a lot of truth in this criticism; certainly Mithin’s account included a lot of guesswork, albeit founded on paleontological evidence. However, Deutscher’s own arguments include extrapolating to recent prehistory. These extrapolations are based on early written languages and subsequent recorded developments, but also include guesswork between the hard evidence, as does the whole family-tree of languages. Deutscher was originally a Cambridge mathematician, like me, so, perhaps unsurprisingly, I found his style of argument convincing. However, given the foundations on Uniformitarianism, which, as noted above, is at best partial when moving from history to pre-history, there seems more of a continuum rather than sharp distinction between the levels of interpretation and extrapolation in this book and Mithin’s.

Deutscher’s account seeks to fill in the gap between the deep prehistoric origins of protolanguage (what Deutscher’s calls ‘me Tarzan’ language) and its subsequent development in the era of media-society (starting 5000BC with extensive Sumerian writing). Rather than seeing these separately, I feel there is a rich account building across various authors, which will, in time, yield a more complete view of our current language and its past.

book: The Singing Neanderthals, Mithin

Posted on August 19, 2011 by alan

One of my birthday presents was Steven Mithin’s “The Singing Neanderthals” and, having been on holiday, I have already read it! I read Mithin’s “The Prehistory of the Mind” some years ago and have referred to it repeatedly over the years¹, so was excited to receive this book, and it has not disappointed. I like his broad approach taking evidence from a variety of sources, as well as his own discipline of prehistory; in times when everyone claims to be cross-disciplinary, Mithin truly is.

“The Singing Neanderthal”, as its title suggests, is about the role of music in the evolutionary development of the modern human. We all seem to be born with an element of music in our heart, and Mithin seeks to understand why this is so, and how music is related to, and part of the development of, language. Mithin argues that elements of music developed in various later hominids as a form of primitive communication², but separated from language in homo sapiens when music became specialised to the communication of emotion and language to more precise actions and concepts.

The book ‘explains’ various known musical facts, including the universality of music across cultures and the fact that most of us do not have perfect pitch … even though young babies do (p77). The hard facts of how things were for humans or related species tens or hundreds of thousands of years ago are sparse, so there is inevitably an element of speculation in Mithin’s theories, but he shows how many, otherwise disparate pieces of evidence from palaeontology, psychology and musicology make sense given the centrality of music.

Whether or not you accept Mithin’s thesis, the first part of the book provides a wide ranging review of current knowledge about the human psychology of music. Coincidentally, while reading the book, there was an article in the Independent reporting on evidence for the importance of music therapy in dealing with depression and aiding the rehabilitation of stroke victims³, reinforcing messages from Mithin’s review.

The topic of “The Singing Neanderthal” is particularly close to my own heart as my first personal forays into evolutionary psychology (long before I knew the term, or discovered Cosmides and Tooby’s work), was in attempting to make sense of human limits to delays and rhythm.

Those who have been to my lectures on time since the mid 1990s will recall being asked to first clap in time and then swing their legs ever faster … sometimes until they fall over! The reason for this is to demonstrate the fact that we cannot keep beats much slower than one per second⁴, and then explain this in terms of our need for a mental ‘beat keeper’ for walking and running. The leg shaking is to show how our legs, as a simple pendulum, have a natural frequency of around 1Hz, hence determining our slowest walk and hence need for rhythm.

Mithin likewise points to walking and running as crucial in the development of rhythm, in particular the additional demands of bipedal motion (p150). Rhythm, he argues, is not just about music, but also a shared skill needed for turn-taking in conversation (p17), and for emotional bonding.

In just the last few weeks, at the HCI conference in Newcastle, I learnt that entrainment, when we keep time with others, is a rare skill amongst animals, almost uniquely human. Mithin also notes this (p206), with exceptions, in particular one species of frog, where the males gather in groups to sing/croak in synchrony. One suggested reason for this is that the louder sound can attract females from a larger distance. This cooperative behaviour of course acts against each frog’s own interest to ‘get the girl’ so they also seek to out-perform each other when a female frog arrives. Mithin imagines that similar pressures may have sparked early hominid music making. As well as the fact that synchrony makes the frogs louder and so easy to hear, I wonder whether the discerning female frogs also realise that if they go to a frog choir they get to chose amongst them, whereas if they follow a single frog croak they get stuck with the frog they find; a form of frog speed dating?

Mithin also suggests that the human ability to synchronise rhythm is about ‘boundary loss’ seeing oneself less as an individual and more as part of a group, important for early humans about to engage in risky collaborative hunting expeditions. He cites evidence of this from the psychology of music, anthropology, and it is part of many people’s personal experience, for example, in a football crowd, or Last Night at the Proms.

This reminds me of the experiments where a rubber hand is touched in time with touching a person’s real hand; after a while the subject starts to feel as if the rubber hand is his or her own hand. Effectively our brain assumes that this thing that correlates with feeling must be part of oneself⁵. Maybe a similar thing happens in choral singing, I voluntarily make a sound and simultaneously everyone makes the sound, so it is as if the whole choir is an extension of my own body?

Part of the neurological evidence for the importance of group music making concerns the production of oxytocin. In experiments on female prairie voles that have had oxytocin production inhibited, they engage in sex as freely as normal voles, but fail to pair bond (p217). The implication is that oxytocin’s role in bonding applies equally to social groups. While this explains a mechanism by which collaborative rhythmic activities create ‘boundary loss’, it doesn’t explain why oxytocin is created through rhythmic activity in the first place. I wonder if this is perhaps to do with bipedalism and the need for synchronised movement during face-to-face copulation, which would explain why humans can do synchronised rhythms whereas apes cannot. That is, rhythmic movement and oxytocin production become associated for sexual reasons and then this generalises to the social domain. Think again of that chanting football crowd?

I should note that Mithin also discusses at length the use of music in bonding with infants, as anyone who has sung to a baby knows, so this offers an alternative route to rhythm & bonding … but not one that is particular to humans, so I will stick with my hypothesis 😉

Sexual selection is a strong theme in the book, the kind of runaway selection that leads to the peacock tail. Changing lifestyles of early humans, in particular longer periods looking after immature young, led to a greater degree of female control in the selection of partners. As human size came close to the physical limits of the environment (p185), Mithin suggests that other qualities had to be used by females to choose their mate, notably male singing and dance – prehistoric Saturday Night Fever.

As one evidence for female mate choice, Mithin points to the overly symmetric nature of hand axes and imagines hopeful males demonstrating their dexterity by knapping ever more perfect axes in front of admiring females (p188). However, this brings to mind Calvin’s “Ascent of Mind“, which argues that these symmetric, ovoid axes were used like a discus, thrown into the midst of a herd of prey to bring one down. The two theories for axe shape are not incompatible. Calvin suggests that the complex physical coordination required by axe throwing would have driven general brain development. In fact these forms of coordination, are not so far from those needed for musical movement, and indeed expert flint knapping, so maybe it was this skills that were demonstrated by the shaping of axes beyond that immediately necessary for purpose.

Mithin’s description of the musical nature of mother-child interactions also brought to mind Broomhall’s “Eternal Child“. Broomhall ‘s central thesis is that humans are effectively in a sort of arrested development with many features, not least our near nakedness, characteristic of infants. Although it was not one of the points Broomhall makes, his arguments made sense to me in terms of the mental flexibility that characterises childhood, and the way this is necessary for advanced human innovation; I am always encouraging students to think in a more childlike way. If Broomhall’s theories were correct, then this would help explain how some of the music making more characteristic of mother-infant interactions become generalised to adult social interactions.

I do notice an element of mutual debunking amongst those writing about richer cognitive aspects of early human and hominid development. I guess a common trait in disciplines when evidence is thin, and theories have to fill a lot of blanks. So maybe Mithin, Calvin and Broomhall would not welcome me bringing their respective contributions together! However, as in other areas where data is necessarily scant (such as sub-atomic physics), one does feel a developing level of methodological rigour, and the fact that these quite different theoretical approaches have points of connection, does suggest that a deeper understanding of early human cognition, while not yet definitive, is developing.

In summary, and as part of this wider unfolding story, “The Singing Neanderthal” is an engaging and entertaining book to read whether you are interested in the psychological and social impact of music itself, or the development of the human mind.

… and I have another of Mithin’s books in the birthday pile, so looking forward to that too!

See particularly my essay on the role of imagination in bringing together our different forms of ‘specialised intelligence’. “The Prehistory of the Mind” highlighted the importance of this ‘cognitive fluidity’, linking social, natural and technological thought, but lays this largely in the realm of language. I would suggest that imagination also has this role, creating a sort of ‘virtual world’ on which different specialised cognitive modules can act (see “imagination and rationality“).[back]
He calls this musical communication system Hmmmm in its early form – Holistic, Multiple-Modal, Manipulative and Musical, p138 – and later Hmmmmm – Holistic, Multiple-Modal, Manipulative, Musical and Mimetic, p221.[back]
“NHS urged to pay for music therapy to cure depression“, Nina Lakhani, The Independent, Monday, 1 August 2011[back]
Professional conductors say 40 beats per minute is the slowest reliable beat without counting between beats.[back]
See also my previous essay on “driving as a cyborg experience“.[back]

book: The Laws of Simplicty, Maeda

Posted on July 23, 2011 by alan

Yesterday I started to read John Maeda’s “The Laws of Simplicty” whilst sitting by Fiona’s stall at the annual Tiree agricultural show, then finished before breakfast today. Maeda describes his decision to cap at 100 pages¹ as something that could be read during a lunch break. To be honest 30,000 words sounds like a very long lunch break or a very fast reader, but true to his third law, “savings in time feel like simplicity”², it is a short read.

The shortness is a boon that I wish many writers would follow (including me). As with so many single issue books (e.g. Blink), there is s slight tendency to over-sell the main argument, but this is forgiveable in a short delightful book, in a way that it isn’t in 350 pages of less graceful prose.

I know I have a tendency, which can be confusing or annoying, to give, paradoxically for fear of misunderstanding, the caveat before the main point. Still, despite knowing this, in the early chapters I did find myself occasionally bristling at Maeda’s occasional overstatement (although in accordance with simplicity, never hyperbole).

One that particularly caught my eye was Maeda’s contrast of the MIT engineer’s RFTM (Read The F*cking Manual) with the “designer’s approach” to:

marry function with form to create intuitive experiences that we understand immediately.

Although in principle I agree with the overall spirit, and am constantly chided by Fiona for not reading instructions³, the misguided idea that everything ought to ‘pick up and use’ has bedeviled HCI and user interface design for at least the past 20 years. Indeed this is the core misconception about Heidegger’s hammer example that I argued against in a previous post “Struggling with Heidegger“. In my own reading notes, my comment is “simple or simplistic!” … and I meant here the statement not the resulting interfaces, although it could apply to both.

It has always been hard to get well written documentation, and the combination of single page ‘getting started’ guides with web-based help, which often disappears when the web site organisation changes, is an abrogation of responsibility by many designers. Not that I am good at this myself. Good documentation is hard work. It used to be the coders who failed to produce documentation, but now the designers also fall into this trap of laziness, which might be euphemistically labelled ‘simplicity’⁴.

Personally, I have found that the discipline of documenting (in the few times I have observed it!) is in fact a great driver of simple design. Indeed I recall a colleague, maybe Harold Thimbleby⁵, once suggested that documentation ought to be written before any code is written, precisely to ensure simple use.

Some years ago I was reading a manual (for a Unix workstation, so quite a few years ago!) that described a potentially disastrous shortcoming of a the disk sync command (which could have corrupted the disk). Helpfully the manual page included a suggestion of how to wrap sync in scripts that prevented the problem. This seemed to add insult to injury; they knew there was a serious problem, they knew how to fix it … and they didn’t do it. Of course, the reason is that manuals are written by technical writers after the code is frozen.

In contrast, I was recently documenting an experimental API⁶ so that a colleague could use it. As I wrote the documentation I found parts hard to explain. “It would be easier to change the code”, I thought, so I did so. The API, whilst still experimental, is now a lot cleaner and simpler.

Coming back to Maena after a somewhat long digression (what was that about simplicity and brevity?). While I prickled slightly at a few statements, in fact he very clearly says that the first few concrete ‘laws’ are the simpler (and if taken in their own simplistic), the later laws are far more nuanced and suggest deeper principles. This includes law 5 “differences: simplicity and complexity need each other”, which suggest that one should strive for a dynamic between simplicity and complexity. This echoes the emphasis on texture I often advocate when talking with students; whether in writing, presenting or in experience design it is often the changes in voice, visual appearance, or style which give life.

the simplest interface?

I wasn’t convinced by Maeda’s early claim that simple designs were simpler and cheaper to construct. Possibly true for physical prodcuts, but rarely so for digital interfaces, where more effort is typically needed in code to create simpler user interfaces. However, again this was something that was revisited later, especially in the context of more computationally active systems (“law 8, in simplicity we trust”), where he contrasts “how much do you need to know about a system?” with “how much does the system know about you?”. The former is the case of more traditional passive systems, whereas more ‘intelligent’ systems such as Amazon recommendations (or even Facebook news feed) favour the latter. This is very similar to the principles for incidental and low-intention interaction that I have discussed in the past⁷.

Finally “The Laws of Simplicity” is beautifully designed in itself. It includes many gems not least those arising from Maeda’s roots in Japanese design culture, including aichaku, the “sense of attachment one can feel for an artefact” (p.69) and omakase meaning “I leave it to you”, which asks the sushi chef to create a meal especially for you (p.76). I am perhaps too much of a controller to feel totally comfortable with the latter, but Maeda’s book certainly inspires the former.

In fact there are 108 pages in the main text, but 9 of these are full page ‘law/chapter’ frontispieces, so 99 real pages. However, if you include the 8 page introduction that gives 107 … so even the 100 page cap is perhaps a more subtle concept than a strict count.[back]
See his full 10 laws of simplicity at lawsofsimplicity.com[back]
My guess is that the MIT engineers didn’t read the manuals either.[back]
Apple is a great — read poor — example here as it relies on keen technofreaks to tell others about the various hidden ways to do things — I guess creating a Gnostic air to the devices.[back]
Certainly Harold was a great proponent of ‘live’ documentation, both Knuth’s literate programming and also documentation that incorporated calculated input and output, rather like dexy, which I reported after last autumn’s Web Art/Science camp.[back]
In fairness, the API had been thrown together in haste for my own use.[back]
See ‘incidental interaction” and HCI book chapter 18.[back]

Six weeks on the road

Posted on July 20, 2011 by alan

I’ve been at home for the last week after six weeks travelling around the UK and elsewhere. I’ve not kept up while on the road so doing a retrospective post on it all and need to try to catch on other half written posts.

As well as time at Talis offices in B’ham and at Lancs (including exam board week), travels have taken me to Pisa for a workshop on ‘Supportive User Interfaces’, to Koblenz for Web Science conference giving a talk on embodiment issues and a poster on web-scale reasoning , to Newcastle for British HCI conference doing a talk on fridge, to Nottingham to give a talk on extended episodic experience, and back to Lancs for a session on creativity! Why can’t I be like sensible folks and talk on one topic!

Supportive User Interfaces

Monday 13th June I attended a workshop in Pisa on “Supportive User Interfaces“, which includes interfaces that adapt in various ways to users. The majority of people there were involved in various forms of model-based user interfaces in which various models of the task, application and interaction are used to generate user interfaces on the fly. W3C have had a previous group in this area; Dave Raggett from w3c was at the workshop and it sounds like there will be a new working group soon. This clearly has strong links to various forms of ‘meta-level’ representations of data, tasks, etc.. My own contribution started the day, framing the area, focusing partly on reasons for having more ‘meta-level’ interfaces including social empowerment, and partly on the principles/techniques that need to be considered at a human level.

Also on Monday was a meeting of IFIP Working Group 2.7/13.4. IFIP is the UNESCO founded pan-national agency that national computer societies such as as the BCS in the UK and ACM and IEEE Computer in the US belong to. Working Group 2.7/13.4 is focused on the engineering of user interfaces. I had been actively involved in the past, but have had many years’ lapse. However, this seemed a good thing to re-engage with with my new Talis hat on!

SUI: paper:

Opening the Box: Meta-level Interfaces Needs and Solutions

Web Science Conference in Koblenz

Jaime Teevan from Microsoft gave the opening keynote at WebSci 2011. I know her from her earlier work on personal information management, but her recent work and keynote was about work on analysing and visualising changes in web pages. Web page changes are also analysed alongside users re-visitation patterns; by looking at the frequency of re-visitation Jaime and her colleagues are able to identify the parts of pages that change with similar frequency, helping them, inter alia, to improve search ranking.

Had many great conversations, some with people I know previously (e.g. the Southampton folks), but also new, including the group at Troy that do lots of work with data.gov. I was particularly interested in some work using content matching to look for links between otherwise unlinked (or only partly inter-linked) datasets. Also lots of good presentations including one on trust prediction and a fantastic talk by Mark Bernstein from Eastgate, which he delivered in blank verse!

My own contribution included the poster that Dave@Talis prepared, which was on the web-scale spreading activation work in collaboration with Univ. Athens. Quite a niche area in a multi-disciplinary conference, so didn’t elicit quite the interest of the social networking posters, but did lead to a small number of in depth discussions.

In addition I gave talk on the more cognitive/philosophical issues when we start to use the web as an external extension to / replacement of memory, including its impact on education. Got some good feedback from this.

Closing keynote was from Barry Wellman, the guy who started social network analysis way before they were on computers. At one point he challenged the Dunbar number¹. I wondered whether this was due to cognitive extension with address books etc., but he didn’t seem to think so; there is evidence that some large circles predate web (although maybe not physical address books). Made me wonder about itinerant tradesmen, tinkers, etc., even with no prostheses. Maybe the numbers sort of apply to any single content, but are repeated for each new context?

WebSci papers:

The HCI Conference – Newcastle

I attended the British HCI conference in Newcastle. This was the 25th conference, and as my very first academic paper in computing² was at the first BHCI in 1984, I was pleased to be there at this anniversary. The paper I was presenting was a retrospective on vfridge, a social networking site dating back to 1999/2000, it seemed an historic occasion!

As is always the case presentations were all interesting. Strictly BHCI is a ‘second tier’ conference compared with CHI, but why is it that the papers are always more interesting, that I learn more? It is likely that a fair number of papers were CHI rejects, so it should be the other way round – is it that selectivity and ‘quality’ inevitably become conservative and boring?

Gregory Abowd gave the closing keynote. It was great to see Gregory again, we meet too rarely. The main focus of his keynote was on three aspects of research: novelty, value and reliability and how his own work had moved within this space over the years. In particular having two autistic sons has led him in directions he would never have considered, and this immediately valuable work has also created highly novel research. Novelty and value can coexist.

Gregory also reflected on the BHCI conference as it was his early academic ‘home’ when he did his PhD and postdoctoral here in the late 1980’s. He thought that it could be rather than, as with many conferences, a second best to getting a CHI paper, instead a place for (not getting the quote quite perfect) “papers that should get into CHI”, by which he meant a proving ground for new ideas that would then go on to be in CHI.

However I initially read the quote differently. BHCI always had a broader concept of HCI compare with CHI’s quite limited scope. That is BHCI as a place that points the way for the future of HCI, just as it was the early nurturing place of MobileHCI. However CHI has now become much broader in it’s own conception, so maybe this is no longer necessary. Indeed at the althci session the organisers said that their only complaint was that the papers were not ‘alt’ enough – that maybe ‘alt’ had become mainstream. This prompted Russell Beale to suggest that maybe althci should now be real science such as replication!

Gregory also noted the power of the conference as a meeting ground. It has always been proud of the breadth of international attendance, but perhaps it is UK saturation that should be it’s real measure of success. Of course the conference agenda has become so full and international travel so much cheaper than it was, so there is a tendency to go to the more topic specific international conferences and neglect the UK scene. This is compounded by the relative dearth of small UK day workshops that used to be so useful in nurturing new researchers.

I feel a little guilty here as this was the first BHCI I had been to since it was in Lancaster in 2007 … as Tom McEwan pointed out I always apologise but never come! However, to be fair I have also only been twice to CHI in the last 10 years, and then when it was in Vienna and Florence. I have just felt too busy, so avoiding conferences that I did not absolutely have to attend.

In response to Gregory’s comments, someone, maybe Tom, mentioned that in days of metrics-based research assessment there was a tendency to submit one’s best work to those venues likely to achieve highest impact, hence the draw of CHI. However, I have hardly ever published in CHI and I think only once in TOCHI, yet, according to Microsoft Research, I am currently the most highly cited HCI researcher over the last 5 years … So you don’t have to publish in CHI to get impact!

And incidentally, the vfridge paper had NOT been submitted to CHI, but was specially written for BHCI as it seemed the fitting place to discuss a thoroughly British product 🙂

vfridge paper:

Anatomy of an Early Social Networking Site

Nottingham MRL

I was at Mixed Reality Lab in Nottingham for Joel Fischer‘s PhD viva and while there did a seminar the afternoon on “extended episodic experience” based on Haliyana Khalid‘s PhD work and ideas that arose from it. Basically, whereas ‘user experience’ has become a big issue most of the work is focused on individual ‘experiences’ whereas much of life consists of ongoing series of experiences (episodes) which together make up the whole experience of interacting with a person or place, following a band, etc.

I had obviously not done a good enough job at wearing Joel down with difficult questions in the PhD viva in the morning as he was there in the afternoon to ask difficult questions back of his own 😉

Docfest – Digital Economy Summer School

The last major event was Docfest, which brought together the PhD students from the digital economy centres from around the country. Not sure of the exact count but just short of 150 participants I think. They come from a wide variety of backgrounds, business, design, computing, engineering, and many are mature students with years of professional experience behind them.

This looked like being a super event, unfortunately I was only able to attend for a day 🙁 However, I had a great evening at the welcome event talking with many of the students and even got to ride in Steve Forshaw‘s Sinclair C5!

My contribution to the event was running the first morning session on ‘creativity’. Surprise, surprise this started with a bad ideas session, but new for me too as the largest group I’ve run in the past has been around 30. There were a number of local Highwire students acting as facilitators for the groups, so I had only to set them off and observe results :-). At the end of the morning I gave some the theoretical background to bad ideas as a method and in understanding (aspects of) creativity more widely.

Other speakers at the event included Jane Prophet, Chris Csikszentmihalyi and Chris Bonnington, so was sad to miss them; although I did get a fascinating chat with Jane over breakfast in the hotel hearing about her new projects on arts and neural imaging, and on how repetitious writing induces temporary psychosis … That is why the teachers give lines, to send the pupils bonkers!

The idea that there are fundamental cognitive limits on social groups with different sized circles family~6, extended family~20, village~60, large village~200[back]
I had published previously in agricultural engineering.[back]

Are five users enough?

Posted on June 4, 2011 by alan

[An edited version of this post is reproduced at HCIbook online!]

I recently got a Facebook message from a researcher about to do a study who asked, “Do you think 5 (users) is enough?”

Along with Fitts’ Law and Miller’s 7+/-2 this “five is enough” must be among the most widely cited, and most widely misunderstood and misused aphorisms of HCI and usability. Indeed, I feel that this post belongs more in ‘Myth Busters” than in my blog.

So, do I think five is enough? Of course, the answer is (annoyingly), “it depends”, sometimes, yes, five is enough, but sometimes, fewer: one appropriate user, or maybe no users at all, and sometimes you need more, maybe many more: 20, 100, 1000. But even when five is enough, the reasons are usually not directly related to Nielsen and Landauer’s original paper, which people often cite (although rarely read) and Nielsen’s “Why You Only Need to Test With 5 Users” alert box (probably more often actually read … well at least the title).

The “it depends” is partly dependent on the particular circumstances of the situation (types of people involved, kind of phenomenon, etc.), and partly on the kind of question you want to ask. The latter is the most critical issue, as if you don’t know what you want to know, how can you know how many users you need?

There are several sorts of reasons for doing some sort of user study/experiment, several of which may apply:

1. To improve a user interface (formative evaluation)

2. To assess whether it is good enough (summative evaluation)

3. To answer some quantitative question such as “what % of users are able to successfully complete this task”

4. To verify or refute some hypothesis such as “does eye gaze follow Fitts’ Law”

5. To perform a broad qualitative investigation of an area

6. To explore some domain or the use of a product in order to gain insight

It is common to see HCI researchers very confused about these distinctions, and effectively perform formative/summative evaluation in research papers (1 or 2) where in fact one of 3-6 is what is really needed.

I’ll look at each in turn, but first to note that, to the extent there is empirical evidence for ‘five is enough”, it applies to the first of these only.

I dealt with this briefly in my paper “Human–Computer Interaction: a stable discipline, a nascent science, and the growth of the long tail” in the John Long Festschrift edition of IwC, and quote here:

In the case of the figure of five users, this was developed based on a combination of a mathematical model and empirical results (Nielsen and Landauer 1993). The figure of five users is:

(i) about the optimal cost/benefit point within an iterative development cycle, considerably more users are required for summative evaluation or where there is only the opportunity for a single formative evaluation stage;

(ii) an average over a number of projects and needs to be assessed on a system by system basis; and

(iii) based on a number of assumptions, in particular, independence of faults, that are more reasonable for more fully developed systems than for early prototypes, where one fault may mask another.

We’ll look at this in more detail below, but critically, the number ‘5’ is not a panacea, even for formative evaluation.

As important as the kind of question you are asking, are the kind of users you are using. So much of modern psychology is effectively the psychology of first year psychology undergraduates (in the in 1950s it was male prisoners). Is this representative? Does this matter? I’ll return to this at the end, but first of all look briefly at each kind of question.

Finally, there is perhaps the real question “will the reviewers of my work think five users is enough” — good publications vs. good science. The answer is that they will certainly be as influenced by the Myth of Five Users as you are, so do good science … but be prepared to need to educate your reviewers too!

formative evaluation – prototyping cycle

As noted formative evaluation was the scope of Nielsen and Landauer’s early work in 1993 that was then cited by Nielsen in his Alert Box in 2000, and which has now developed mythic status in the field.

The 1993 paper was assuming a context of iterative development where there would be many iterations, and asking how many users should be used per iteration, that is how many users should you test before fixing the problems found by those users, and then performing another cycle of user testing, and another. That is, in all cases they considered, the total number of users involved would be far more than five, it is just the number used in each iteration that was lower.

In order to calculate the optimal number of subjects to use per iteration, they looked at:

(i) the cost of performing a user evaluation

(ii) the number of new faults found (additional users will see many of the same faults, so there are diminishing returns)

(iii) the cost of a redevelopment cycle

All of these differ depending on the kind of project, so Nielsen and Landauer looked at a range of projects of differing levels of complexity. By putting them together, and combining with simple probabilistic models of bug finding in software, you can calculate an optimal number of users per experiment.

They found that, depending on the project, the statistics and costs varied and hence the optimal number of users/evaluators (between 7 and 21), with, on the whole, more complex projects (with more different kinds of bugs and more costly redevelopment cycles) having a higher optimal number than simpler projects. In fact all the numbers are larger than five, but five was the number in Nielsen’s earlier discount engineering paper, so the paper did some additional calculations that yielded a different kind of (lower) optimum (3.2 users — pity the last 0.2 user), with five somewhere between 7 and 3 … and a myth was born!

Today, with Web 2.0 and ‘perpetual beta’, agile methods and continuous deployment reduce redevelopment costs to near zero, and so Twidale and Marty argue for ‘extreme evaluation‘ where often one user may be enough (see also my IwC paper).

The number also varies through the development process; early on, one user (indeed using it yourself) will find many, many faults that need to be fixed. Later faults become more obscure, or maybe only surface after long-term use.

Of course, if you use some sort of expert or heuristic evaluation, then the answer may be no real users at all!

And anyway all of this is about ‘fault finding’, usability is not about bug fixing but making things better, it is not at all clear how useful, if at all, literature on bug fixing is for creating positive experiences.

summative evaluation – is it good enough to release

If you are faced with a product and want to ask “is it good enough?” (which can mean, “are there any usability ‘faults’?”, or, “do people want to use it?”), then five users is almost certainly not enough. To give yourself any confidence of coverage of the kinds of users and kinds of use situations, you may need tens or hundreds of users, very like hypothesis testing (below).

However, the answer here may also be zero users. If the end product is the result of a continuous evaluation process with one, five or some other number of users per iteration, then the number of users who have seen the product during this process may be sufficient, especially if you are effectively iterating towards a steady state where few or no new faults are found per iteration.

In fact, even when there has been a continuous process, the need for long-term evaluation becomes more critical as the development progresses, and maybe the distinction between summative and late-stage formative is moot.

But in the end there is only one user you need to satisfy — the CEO … ask Apple.

quantitative questions and hypothesis testing

(Note, there are real numbers here, but if you are a numerophobe never fear, the next part will go back to qualitative issues, so bear with it!)

Most researchers know that “five is enough” does not apply in experimental or quantitative studies … but that doesn’t always stop them quoting numbers back!

Happily in dealing with more quantitative questions or precise yes/no ones, we can look to some fairly solid statistical rules for the appropriate number of users for assessing different kinds of effects (but do note “the right kind of users” below). And yes, very, very occasionally five may be enough!

Let’s imagine that our hypothesis is that a behaviour will occur in 50% of users doing an experiment. With five users, the probability that we will see this behaviour in at least one user is 1 in 32, which is approximately 3%. That is if we do not observe the behaviour at all, then we have a statistically significant result at 5% level (p<0.05) and can reject the hypothesis.

Note that there is a crucial difference between a phenomenon that we expect to see in about 50% of user iterations (i.e. the same user will do it about 50% of the time) and one where we expect 50% of people to do it all of the time. The former we can deal with using a small number of users and maybe longer or repeated experiments, the latter needs more users.

If instead, we wanted to show that a behaviour happens less than 25% of the time, then we need at least 11 users, for 10% 29 users. On the other hand, if we hypothesised that a behaviour happens 90% of the time and didn’t see it in just two users we can reject the hypothesis at significance level of 1%. In the extreme if our hypothesis is that something never happens and we see it with just one user, or if the hypothesis is that it always happens and we fail to see it with one user, in both cases we can reject our hypothesis.

The above only pertains when you see samples where either none or all of the users do something. More often we are trying to assess some number. Rather than “does this behaviour occur 50% of the time”, we are asking “how often does this behaviour occur”.

Imagine we have 100 users (a lot more than five!), and notice that 60% do one thing and 40% do the opposite. Can we conclude that in general the first thing is more prevalent? The answer is yes, but only just. Where something is a simple yes/no or either/or choice and we have counted the replies, we have a binomial distribution. If we have n (100) users and the probability of them answering ‘yes’ is p (50% if there is no real preference), then the maths says that the average number of times we expect to see a ‘yes’ response is n x p = 100 x 0.5 = 50 people — fairly obvious. It also says that the standard deviation of this count is sqrt(n x p x (1-p ) ) = sqrt(25) = 5. As a rule of thumb if answers differ by more than 2 standard deviations from the expected value, then this is statistically significant; so 60 ‘yes’ answers vs. the expected 50 is significant at 5%, but 55 would have just been ‘in the noise’.

Now drop this down to 10 users and imagine you have 7 ‘yes’s and 3 ‘no’s. For these users, in this experiment, they answer ‘yes’ more than twice as often as ‘no’, but here this difference is still no more than we might expect by chance. You need at least 8 to 2 before you can say anything more. For five users even 4 to 1 is quite normal (try tossing five coins and see how many come up heads); only if all or none do something can you start to think you are onto something!

For more complex kinds of questions such as “how fast”, rather than “how often”, the statistics becomes a little more complex, and typically more users are needed to gain any level of confidence.

As a rule of thumb some psychologists talk of 20 users per condition, so if you are comparing 4 things then you need 80 users. However, this is just a rule of thumb and some phenomena have very high variability (e.g. problem solving) whereas others (such as low-level motor actions) are more repeatable for an individual and have more consistency between users. For phenomena with very high variability even 20 users per condition may be too few, although within subjects designs may help if possible. Pilot experiments or previous experiments concerning the same phenomenon are important, but this is probably the time to consult a statistician who can assess the statistical ‘power’ of a suggested design (the likelihood that it will reveal the issue of interest).

qualitative study

Here none of the above applies and … so … well … hmm how do you decide how many users? Often people rely on ‘professional judgement’, which is a posh way of saying “finger in the air”.

In fact, some of the numerical arguments above do still apply (sorry numerophobes). If as part of your qualitative study you are interested in a behaviour that you believe happens about half the time, then with five users you would be very unlucky not to observe it (3% of the time). Or put it another way, if you observe five users you will see around 97% of behaviours that at least half of all users have (with loads and loads of assumptions!).

If you are interested in rarer phenomena, then you need either lots more users (for behaviour that you only see in 1 in 10 users, then you have only a 40% chance of observing it with 5 users, and perhaps more surprisingly, only 65% chance of seeing it with 10 users).

However, if you are interested in a particular phenomenon, then randomly choosing people is not the way to go anyway, you are obviously going to select people who you feel are most likely to exhibit it; the aim is not to assess its prevalence in the world, but to find a few and see what happens.

Crucially when you generalise from qualitative results you do it differently.

Now in fact you will see many qualitative papers that add caveats to say “our results only apply to the group studied …”. This may be necessary to satisfy certain reviewers, but is at best disingenuous – if you really believe the results of your qualitative work do not generalise at all, then why are you publishing it – telling me things that I cannot use?

In fact, we do generalise from qualitative work, with care, noting the particular limitations of the groups studied, but still assume that the results are of use beyond the five, ten or one hundred people that we observed. However, we do not generalise through statistics, or from the raw data, but through reasoning that certain phenomena, even if only observed once, are likely to be ones that will be seen again, even if differing in details. We always generalise from our heads, not from data.

Whether it is one, five or more, by its nature deep qualitative data will involve fewer users than more shallow methods such as large scale experiments or surveys. I often find that the value of this kind of deep interpretative data is enhanced by seeing it alongside large-scale shallow data. For example, if survey or log data reveals that 70% of users have a particular problem and you observe two users having the same problem, then it is not unreasonable to assume that the reasons for the problem are similar to those of the large sample — yes you can generalise from two!

Indeed one user may be sufficient (as often happens with medical case histories, or business case studies), but often it is about getting enough users so that interesting things turn up.

exploratory study

This looking for interesting things is often the purpose of research: finding a problem to tackle. Once we have found an interesting issue, we may address it in many ways: formal experiments, design solutions, qualitative studies; but none of these are possible without something interesting to look at.

In such situations, as we saw with qualitative studies in general, the sole criteria for “is N enough” is whether you have learnt something.

If you want to see all, or most of the common phenomena, then you need lots of users. However, if you just want to find one interesting one, then you only need as many as gets you there. Furthermore whilst you often choose ‘representative or ‘typical’ users (what is a typical user!) for most kinds of study and evaluation, for exploratory analysis, often extreme users are most insightful; of course you have to work out whether your user or users are so unusual that the things you observe are unique to them … but again real research comes from the head, you have to think about it and make an assessment.

In the IwC paper I discuss some of the issues of single person studies in more detail and Fariza Razak’s thesis is all about this.

the right kind of users

If you have five, fifty or five hundred users, but they are all psychology undergraduates, they are not going to tell you much about usage by elderly users, or by young unemployed people who have left school without qualifications.

Again the results of research ultimately come from the head not the data: you will never get a complete typical, or representative sample of users; the crucial thing is to understand the nature of the users you are studying, and to make an assessment of whether the effects you see in them are relevant, and interesting more widely. If you are measuring reaction times, then education may not be a significant factor, but Game Boy use may be.

Many years ago I was approached by a political science PhD student. He had survey data from over 200 people (not just five!), and wanted to know how to calculate error bars to go on his graphs. This was easily done and I explained the procedure (a more systematic version of the short account given earlier). However, I was more interested in the selection of those 200 people. They were Members of Parliament; he had sent the survey to every MP (all 650 of them) and got over 200 replies, a 30% return rate, which is excellent for any survey. However, this was a self-selected group and so I was more interested in whether the grounds for self-selection influenced the answers than in how many of them there were. It is often the case that those with strong views on a topic are more likely to answer surveys on it. The procedure he had used was as good as possible, but, in order to be able to make any sort of statement about the interpretation of the data, he needed to make a judgement. Yet again knowledge is ultimately from the head not the data.

For small numbers of users these choices are far more critical. Do you try and choose a number of similar people, so you can contrast them, or very different so that you get a spread? There is no right answer, but if you imagine having done the study and interpreting the results this can often help you to see the best choice for your circumstances.

being practical

In reality whether choosing how many, or who, to study, we are driven by availability. It is nice to imagine that we make objective selections based on some optimal criteria — but life is not like that. In reality, the number and kind of users we study is determined by the number and kind of users we can recruit. The key thing is to understand the implications of these ‘choices’ and use these in your interpretation.

As a reviewer I would prefer honesty here, to know how and why users were selected so that I can assess the impact of this on the results. But that is a counsel of perfection, and again good science and getting published are not the same thing! Happily there are lovely euphemisms such as ‘convenience sample’ (who I could find) and ‘snowball sample’ (friends of friends, and friends of friends of friends), which allow honesty without insulting reviewers’ academic sensibilities.

in the end

Is five users enough? It depends: one, five, fifty or one thousand (Google test live with millions!). Think about what you want out of the study: numbers, ideas, faults to fix, and the confidence and coverage of issues you are interested in, and let that determine the number.

And, if I’ve not said it enough already, in the end good research comes from your head, from thinking and understanding the users, the questions you are posing, not from the fact that you had five users.

references

A. Dix (2010) Human-Computer Interaction: a stable discipline, a nascent science, and the growth of the long tail. Interacting with Computers, 22(1) pp. 13-27. http://www.hcibook.com/alan/papers/IwC-LongFsch-HCI-2010/

Nielsen, J. (1989). Usability engineering at a discount. In Salvendy, G., and Smith, M.J. (Eds.), Designing and Using Human–Computer Interfaces and Knowledge Based Systems, Elsevier Science Publishers, Amsterdam. 394-401.

Nielsen, J. and Landauer, T. K. 1993. A mathematical model of the finding of usability problems. In Proceedings of the INTERACT ’93 and CHI ’93 Conference on Human Factors in Computing Systems (Amsterdam, The Netherlands, April 24 ? 29, 1993). CHI ’93. ACM, New York, NY, 206?213. http://doi.acm.org/10.1145/169059.169166

Jakob Nielsen’s Alertbox, March 19, 2000: Why You Only Need to Test With 5 Users. http://www.useit.com/alertbox/20000319.html

Fariza Razak (2008). Single Person Study: Methodological Issues. PhD Thesis. Computing Department, Lancaster University, UK. February 2008. http://www.hcibook.net/people/Fariza/

Michael Twidale and Paul Marty (2004-) Extreme Evaluation. http://people.lis.uiuc.edu/~twidale/research/xe/

the ordinary and the normal

Posted on March 30, 2009 by alan

I am reading Michel de Certeau’s “The Practice of Everyday Life“. The first chapter begins:

The erosion and denigration of the singular or the extraordinary was announced by The Man Without Qualities¹: “…a heroism but enormous and collective, in the model of ants” And indeed the advent of the anthill society began with the masses, … The tide rose. Next it reached the managers … and finally it invaded the liberal professions that thought themselves protected against it, including even men of letters and artists.”

Now I have always hated the word ‘normal’, although loved the ‘ordinary’. This sounds contradictory as they mean almost the same, but the words carry such different connotations. If you are not normal you are ‘subnormal’ or ‘abnormal’, either lacking in something or perverted. To be normal is to be normalised, to be part of the crowd, to obey the norms, but to be distinctive or different is wrong. Normal is fundamentally fascist.

In contrast the ordinary does not carry the same value judgement. To be different from ordinary is to be extra-ordinary², not sub-ordinary or ab-ordinary. Ordinariness does not condemn otherness.

Certeau is studying the everyday. The quote is ultimately about the apparently relentless rise of the normal over the ordinary, whereas Certeau revels in the small ways ordinary people subvert norms and create places within the interstices of the normal.

The more I study the ordinary, the mundane, the quotidian, the more I discover how extraordinary is the everyday³. Both the ethnographer and the comedian are expert at making strange, taking up the things that are taken for granted and holding them for us to see, as if for the first time. Walk down an anodyne (normalised) shopping street, and then look up from the facsimile store fronts and suddenly cloned city centres become architecturally unique. Then look through the crowd and amongst the myriad incidents and lives around, see one at a time, each different.

Sometimes it seems as if the world conspires to remove this individuality. The InfoLab21 building that houses the Computing Dept. at Lancaster was sort listed for a people-centric design award of ‘best corporate workspace‘. Before the judging we had to remove any notices from doors or any other sign that the building was occupied, nothing individual, nothing ordinary, sanitised, normalised.

However, all is not lost. I was really pleased the other day to see a paper “Making Place for Clutter and Other Ideas of Home”⁴. Laural, Alex and Richard are looking at the way people manage the clutter in their homes: keys in bowls to keep them safe, or bowls on a worktop ready to be used. They are looking at the real lives of ordinary people, not the normalised homes of design magazines, where no half-drunk coffee cup graces the coffee table, nor the high-tech smart homes where misplaced papers will confuse the sensors.

Like Fariza’s work on designing for one person⁵, “Making a Place for Clutter” is focused on single case studies not broad surveys. It is not that the data one gets from broader surveys and statistics is not important (I am a mathematician and a statistician!), but read without care the numbers can obscure the individual and devalue the unique. I heard once that Stalin said, “a million dead in Siberia is a statistic, but one old woman killed crossing the road is a national disaster”. The problem is that he could not see that each of the million was one person too. “Aren’t two sparrows sold for only a penny? But your Father knows when any one of them falls to the ground.”⁶.

We are ordinary and we are special.

The Man without Qualities, Robert Musil, 1930-42, originally: Der Mann ohne Eigenschafte. Picador Edition 1997, Trans. Sophie Wilkins and Burton Pike: Amazon | Wikipedia[back]
Sometimes ‘extraordinary’ may be ‘better than’, but more often simply ‘different from’, literally the Latin ‘extra’ = ‘outside of’[back]
as in my post about the dinosaur joke![back]
Swan, L., Taylor, A. S., and Harper, R. 2008. Making place for clutter and other ideas of home. ACM Trans. Comput.-Hum. Interact. 15, 2 (Jul. 2008), 1-24. DOI= http://doi.acm.org/10.1145/1375761.1375764[back]
Described in Fariza’s thesis: Single Person Study: Methodological Issues and in the notes of my SIGCHI Ireland Inaugural Lecture Human-Computer Interaction in the early 21st century: a stable discipline, a nascent science, and the growth of the long tail.[back]
Matthew 10:29[back]

Touching Technology

Posted on March 28, 2009 by alan

I’ve given a number of talks over recent months on aspects of physicality, twice during winter schools in Switzerland and India that I blogged about (From Anzere in the Alps to the Taj Bangelore in two weeks) a month or so back, and twice during my visit to Athens and Tripolis a few weeks ago.

I have finished writing up the notes of the talks as “Touching Technology: taking the physical world seriously in digital design“. The notes are partly a summary of material presented in previous papers and also some new material. Here is the abstract:

Although we live in an increasingly digital world, our bodies and minds are designed to interact with the physical. When designing purely physical artefacts we do not need to understand how their physicality makes them work – they simply have it. However, as we design hybrid physical/digital products, we must now understand what we lose or confuse by the added digitality. With two and half millennia of philosophical ponderings since Plato and Aristotle, several hundred years of modern science, and perhaps one hundred and fifty years of near modern engineering – surely we know sufficient about the physical for ordinary product design? While this may be true of the physical properties themselves, it is not the fact for the way people interact with and rely on those properties. It is only when the nature of physicality is perturbed by the unusual and, in particular the digital, that it becomes clear what is and is not central to our understanding of the world. This talk discusses some of the obvious and not so obvious properties that make physical objects different from digital ones. We see how we can model the physical aspects of devices and how these interact with digital functionality.

After finishing typing up the notes I realised I have become worryingly scholarly – 59 references and it is just notes of the talk!

Alan looking scholarly

Alan Dix

Tag Archives: human computer interaction