Web Art/Science Camp — how web killed the hypertext star and other stories

Had a great day on Saturday at the at the Web Art/Science Camp (twitter: #webartsci , lanyrd: web-art-science-camp). It was the first event that I went to primarily with my Talis hat on and first Web Science event, so very pleased that Clare Hooper told me about it during the DESIRE Summer School.

The event started on Friday night with a lovely meal in the restaurant at the British Museum. The museum was partially closed in the evening, but in the open galleries Rosetta Stone, Elgin Marbles and a couple of enormous totem poles all very impressive. … and I notice the BM’s website when it describes the Parthenon Sculptures does not waste the opportunity to tell us why they should not be returned to Greece!

Treasury of Atreus

I was fascinated too by images of the “Treasury of Atreus” (which is actually a Greek tomb and also known as the Tomb of Agamemnon. The tomb has a corbelled arch (triangular stepped stones, as visible in the photo) in order to relieve load on the lintel. However, whilst the corbelled arch was an important technological innovation, the aesthetics of the time meant they covered up the triangular opening with thin slabs of fascia stone and made it look as though lintel was actually supporting the wall above — rather like modern concrete buildings with decorative classical columns.

how web killed the hypertext star

On Saturday, the camp proper started with Paul de Bra from TU/e giving a sort of retrospective on pre-web hypertext research and whether there is any need for hypertext research anymore. The talk brought out several of the issues that have worried me also for some time; so many of the lessons of the early hypertext lost in the web1.

For me one of the most significant issues is external linkage. HTML embeds links in the document using <a> anchor tags, so that only the links that the author has thought of can be present (and only one link per anchor). In contrast, mature pre-web hypertext systems, such as Microcosm2, specified links eternally to the document, so that third parties could add annotation and links. I had a few great chats about this with one of the Southampton Web Science DTC students; in particular, about whether Google or Wikipedia effectively provide all the external links one needs.

Paul’s brief history of hypertext started, predictably, with Vannevar Bush‘s  “As We May Think” and Memex; however he pointed out that Bush’s vision was based on associative connections (like the human mind) and trails (a form of narrative), not pairwise hypertext links. The latter reminded me of Nick Hammond’s bus tour metaphor for guided educational hypertext in the 1980s — occasionally since I have seen things a little like this, and indeed narrative was an issue that arose in different guises throughout the day.

While Bush’s trails are at least related to the links of later hypertext and the web, the idea of associative connections seem to have been virtually forgotten.  More recently in the web however, IR (information retrieval) based approaches for page suggestions like Alexa and content-based social networking have elements of associative linking as does the use of spreading activation in web contexts3

It was of course Nelson who coined the term hypertext, but Paul reminded us that Ted Nelson’s vision of hypertext in Xanadu is far richer than the current web.  As well as external linkage (and indeed more complex forms in his ZigZag structures, a form of faceted navigation.), Xanadu’s linking was often in the form of transclusions pieces of one document appearing, quoted, in another. Nelson was particularly keen on having only one copy of anything, hence the transclusion is not so much a copy as a reference to a portion. The idea of having exactly one copy seems a bit of computing obsession, and in non-technical writing it is common to have quotations that are in some way edited (elision, emphasis), but the core thing to me seems to be the fact that the target of a link as well as the source need not be the whole document, but some fragment.

Paul de Bra's keynote at Web Art/Science Camp (photo Clare Hooper)

Over a period 30 years hypertext developed and started to mature … until in the early 1990s came the web and so much of hypertext died with its birth … I guess a bit like the way Java all but stiltified programming languages. Paul had a lovely list of bad things about the web compared with (1990s) state of the art hypertext:

Key properties/limitations in the basic Web:

  1. uni-directional links between single nodes
  2. links are not objects (have no properties of their own)
  3. links are hardwired to their source anchor
  4. only pre-authored link destinations are possible
  5. monolithic browser
  6. static content, limited dynamic content through CGI
  7. links can break
  8. no transclusion of text, only of images

Note that 1, 3 and 4 are all connected with the way that HTML embeds links in pages rather than adopting some form of external linkage. However, 2 is also interesting; the fact that links are not ‘first class objects’. This has been preserved in the semantic web where an RDF triple is not itself easily referenced (except by complex ‘reification’) and so it is hard to add information about relationships such as provenance.

Of course, this same simplicity (or even that it was simplistic) that reduced the expressivity of HTML compared with earlier hypertext is also the reasons for its success compared with earlier more heavy weight and usually centralised solutions.

However, Paul went on to describe how many of the features that were lost have re-emerged in plugins, server enhancements (this made me think of systems such as zLinks, which start to add an element of external linkage). I wasn’t totally convinced as these features are still largely in research prototypes and not entered the mainstream, but it made a good end to the story!

demos and documentation

There was a demo session as well as some short demos as part of talks. Lots’s of interesting ideas. One that particularly caught my eye (although not incredibly webby) was Ana Nelson‘s documentation generator “dexy” (not to be confused with doxygen, another documentation generator). Dexy allows you to include code and output, including screen shots, in documentation (LaTeX, HTML, even Word if you work a little) and live updates the documentation as the code updates (at least updates the code and output, you need to change the words!). It seems to be both a test harness and multi-version documentation compiler all in one!

I recall that many years ago, while he was still at York, Harold Thimbleby was doing something a little similar when he was working on his C version of Knuth’s WEB literate programming system. Ana’s system is language neutral and takes advantage of recent developments, in particular the use of VMs to be able to test install scripts and to be sure to run code in a consistent environments. Also it can use browser automation for web docs — very cool 🙂

Relating back to Paul’s keynote this is exactly an example of Nelson’s transclusion — the code and outputs included in the document but still tied to their original source.

And on this same theme I demoed Snip!t as an example of both:

  1. attempting to bookmark parts of web pages, a form of transclusion
  2. using data detectors a form of external linkage

Another talk/demo also showed how Compendium could be used to annotate video (in the talk regarding fashion design) and build rationale around … yet another example of external linkage in action.

… and when looking after the event at some of Weigang Wang‘s work on collaborative hypermedia it was pleasing to see that it uses a theoretical framework for shared understanding in collaboratuve hypermedia that builds upon my own CSCW framework from the early 1990s 🙂

sessions: narrative, creativity and the absurd

Impossible to capture in a few words, but one session included different talks and discussion about the relation of narrative and various forms of web experiences — including a talk on the cognitive psychology of the Kafkaesque. Also discussion of creativity with Nathan live recording in IBIS!

what is web science

I guess inevitably in a new area there was some discussion about “what is web science” and even “is web science a discipline”. I recall similar discussions about the nature of HCI 25 years ago and not entirely resolved today … and, as an artist who was there reminded us, they still struggle with “what is art?”!

Whether or not there is a well defined discipline of ‘web science’, the web definitely throws up new issues for many disciplines including new challenges for computing in terms of scale, and new opportunities for the social sciences in terms of intrinsically documented social interactions. One of the themes that recurred to distinguish web science from simply web technology is the human element — joy to my ears of course as a HCI man, but I think maybe not the whole story.

Certainly the gathering of people from different backgrounds in a sort of disciplinary bohemia is exciting whether or not it has a definition.

  1. see also “Names, URIs and why the web discards 50 years of computing experience“[back]
  2. Wendy Hall, Hugh Davis and Gerard Hutchings, “Rethinking Hypermedia:: The Microcosm Approach, Springer, 1996.[back]
  3. Spreading activation is used by a number of people, some of my own work with others at Athens, Rome and Talis is reported in “Ontologies and the Brain: Using Spreading Activation through Ontologies to Support Personal Interaction” and “Spreading Activation Over Ontology-Based Resources: From Personal Context To Web Scale Reasoning“.[back]

UK internet far from ubiquitous

On the last page of the Guardian on Saturday (13th Oct) in a sort of ‘interesting numbers’ section, they say that:

“30% of the UK population have no internet access at home”

I couldn’t find the exact source of this, however, another  guardian article “UK internet audience rises by 1.9 million over last year” dated Wednesday 30 June 2010 has a similar figure.  This says that Internet use  has grown to 38.8 million. The National Statistics office say the overall UK population is 61,792,000 with 1/5 under 16, so call that 2 in 16 under 10 or around 8 million. That gives an overall population of a little under 54 million over 10 years old, that is still only 70% actually using the web at all.

My guess is that some of the people with internet at home do not use it, and some of the ones without home connections use it using other means (mobile, use at school, cyber cafe’s), but by both measures we are hardly a society where the web is as ubiquitous as one might have imagined.

wisdom of the crowds goes to court

Expert witnesses often testify in court cases whether on DNA evidence, IT security or blood splatter patterns.  However, in the days of Web 2.0 who is the ‘expert’ witness?  Would then true Web 2.0 court submit evidence to public comments, maybe, like the Viking Thing or Wild West lynch mob, a vote of the masses using Facebook ‘Like’ could determine guilt or innocence.

However, it will be a conventional judge, not the justice of social networks, who will adjudicate if the hoteliers threatening to sue TripAdvisor1 do indeed bring the case to court. When TripAdvisor seeks to defend its case, they will not rely on crowd-sourced legal opinions, but lawyers whose advice is trusted because they are trained, examined and experienced and who are held responsible for their advice.  What is at stake is precisely the fact that TripAdvisor’s own site has none of these characteristics.

This may well, like the Shetland newspaper case in the 1990s2, become a critical precedent for many crowd-sourced sites and so is something we should all be watching.

Unlike Wikipedia or legal advice itself, ‘expertise’ is not the key issue in the case of TripAdvisior: every hotel guest is in a way the best expert as to their own experience.  However, how is the reader to know that the reviews posted are really by disgruntled guests rather than business rivals?  In science we are expected to declare sources of research funding, so that the reader can make judgements on the reliability of evidence funded by the tobacco or oil industry or indeed the burgeoning renewables sector.  Those who flout these conventions and rules may expect their papers to be withdrawn and their careers to flounder.  Similarly if I make a defamatory public statement about friend, colleague or public figure, then not only can the reliability of my words be determined by my own reputation for trustworthiness, but if my words turn out to be knowingly or culpably false and damaging then I can be sued for libel.   In the case of TripAdvisor there are none of the checks and balances of science or the law and yet the impact on individual hoteliers can make or break their business.    Who is responsible for damage caused by any untrue or malicious reviews posted on the site: the anonymous ‘crowd’ or TripAdvisor?

Of course users of review sites are not stupid, they know (or do they) that anonymous reviews should be taken with a pinch of salt.  My guess is that a crucial aspect of the case may be the extent to which TripAdvisor itself appears to lend credence to the reviews it publishes.  Indeed every page of TripAdvisior is headed with their strap line “World’s most trusted travel advice™”.

At the top of the home page there is also the phrase “Find Hotels Travelers Trust” and further down, “Whether you prefer worldwide hotel chains or cozy boutique hotels, you’ll find real hotel reviews you can trust at TripAdvisor“.  The former arguably puts the issue of trust back to the reviewers, but the latter is definitely TripAdvisor asserting to the trustworthiness of the reviews.

I think if I were in TripAdvisor I would be worried!

Issues of trust and reliability, provenance and responsibility are also going to be an important strand of the work I’ll be part of myself  at Talis: how do we assess the authority of crowd-sourced material, how do we convey to users the level of reliability of the information they view, especially if it is ‘mashed’ from different sources, how do we track the provenance of information in order to be able to do this?   Talis is interested because as a major provider and facilitator of open data, the reliability of the information it and its clients provide is a crucial part of that information — unreliable information is not information!

However, these issues are critical for everyone involved in the new web; if those of us engaged in research and practice in IT do not address these key problems then the courts will.

  1. see The Independent, “Hoteliers to take their revenge on TripAdvisor’s critiques in court“, Saturday 11th Sept. 2010[back]
  2. The case around 1996/1997 involved the Shetland Times obtaining a copyright against ‘deep linking’ by the rival Shetland News, that is links directly to news stories bypassing the Shetland News home page.  This was widely reported at the time and became an important case in Internet law: see, for example, Nov 1996 BBC News story or netlitigation.com article.  The out of court settlement allowed the deep linking so long as the link was clearly acknowledged.  However, while the settlement was sensible, the uncertainty left by the case pervaded the industry for years, leading to some sites abandoning link pages, or only linking after obtaining explicit permissions, thus stifling the link-economy of the web. [back]

fixing hung iCal

iCal hung on a sync with Google calendars and kept hanging everytime I restarted it, even after restarting the whole machine.

I found some advice on this in a few posts.

One “Fix an iCal ‘application not responding’ occasional hang” was more about occasional long pauses and suggested selecting”Reset Sync History” in  “iSync » Preferences”.  Another  “Fix an iCal hang due to system date reset” suggested resetting the ‘lastHearBeatDate‘ in Library/Preferences/com.apple.iCal.plist. Neither worked, but prompted by the latter I used TimeMachine (yawn yawn how do they make it sooooo sloooow), to restore copies of all the iCal plist files in Library/Preferences/, but again to no avail.

So several good suggestions, but none worked.

Happily I saw a comment lower down on “Fix an iCal hang due to system date reset” which suggested moving the complete ~/Library/Calendars folder out to the desktop and then recopying the calendar files in one by one after restarting iCal. I didn’t do this as such, but instead in ~/Library/Calendars there are a number of ‘Calendar Cache‘ files and also a folder labelled Calendar Sync Changes. I removed these, restarted and … it works 🙂

Hardly easy for the end user though :-/

Struggling with Heidegger

Heidegger and hammers have been part of HCI’s conceptualisation from pretty much as long as I can recall.  Although maybe I first heard the words at some sort of day workshop in the late 1980s as the hammer example as used in HCI annoyed me even then, so let’s start with hammers.

hammers

I should explain that problems with the hammer example are not my current struggles with Heidegger!  For the hammer it is just that Heidegger’s ‘ready at hand’ is often confused with ‘walk up and use’.  In  Heidegger ready-at-hand refers to the way one is focused on the nail, or wood to be joined, not the hammer itself:

“The work to be produced is the “towards which” of such things as the hammer, the plane, and the needle” (Being and Time1, p.70/99)

To be ‘ready to hand’ like this typically requires familiarity with the equipment (another big Heidegger word!), and is very different from the way a cash machine or tourist information systems should be in some ways accessible independent of prior knowledge (or at least only generic knowledge and skills).

My especial annoyance with the hammer example stems from the fact that my father was a carpenter and I reckon it took me around 10 years to learn how to use a hammer properly2!  Even holding it properly is not obvious, look at the picture.

There is a hand sized depression in the middle.  If you have read Norman’s POET you will think, “ah yes perceptual affordance’, and grasp it like this:

But no that is not the way to hold it!  If try to use it like this you end up using the strength of your arm to knock in the nail and not the weight of the hammer.

Give it to a child, surely the ultimate test of ‘walk up and use’, and they often grasp the head like this.

In fact this is quite sensible for a child as a ‘proper’ grip would put too much strain on their wrist.  Recall  Gibson’s definition of affordance was relational3, about the ecological fit between the object and the potential actions, and the actions depends on who is doing the acting.  For a small child with weaker arms the hammer probably only affords use at all with this grip.

In fact the ‘proper’ grip is to hold it quite near the end where you can use the maximum swing of the hammer to make most use of the weight of the hammer and its angular momentum:

Anyway, I think maybe Heidegger knew this even if many who quote him don’t!

Heidegger

OK, so its alright me complaining about other people mis-using Heidegger, but I am in the middle of writing one of the chapters for TouchIT and so need to make sure I don’t get it wrong myself … and there my struggles begin.  I need to write about ready-to-hand and present-to-hand.   I thought I understood them, but always it has been from secondary sources and as I sat with Being and Time in one hand, my Oxford Companion to Philosophy in another and various other books in my teeth … I began to doubt.

First of all what I thought the distinction was:

  • ready at hand — when you are using the tool and it is invisible to you, you just focus on the work to be done with it
  • present at hand — when there is some sort of breakdown, the hammer head is loose or you don’t have the right tool to hand and so start to focus on the tools themsleves rather than on the job at hand

Scanning the internet this is certainly what others think, for example blog posts at 251 philosophy and Matt Webb at Berg4.  Koschmann, Kuutti and Hickman produced an excellent comparison of breakdown in Heidegger, Leont’ev and Dewey5, and from this it looks as though the above distinction maybe comes Dreyfus summary of Heidegger — but again I don’t have a copy of Dreyfus’ “Being-in-the-World“, so not certain.

Now this is an important distinction, and one that Heidegger certainly makes.  The first part is very clearly what Heidegger means by ready-to-hand:

“The peculiarity of what is proximally to hand is that, in its readiness-to-hand, it must, as it were, withdraw … that with which we concern ourselves primarily is the work …” (B&T, p.69/99)

The second point Heidegger also makes at length distinguishing at least three kinds of breakdown situation.  It just seems a lot less clear whether ‘present-at-hand’ is really the right term for it.  Certainly the ‘present-at-hand’ quality of an artefact becomes foregrounded during breakdown:

“Pure presence at hand announces itself in such equipment, but only to withdraw to the readiness-in-hand with which one concerns oneself — that is to say, of the sort of thing we find when we put it back into repair.” (B&T, p.73/103)

But the preceeding sentance says

“it shows itself as an equipmental Thing which looks so and so, and which, in its readiness-to-hand as looking that way, has constantly been present-at-hand too.” (B&T, p.73/103)

That is present-at-hand is not so much in contrast to ready-at-hand, but in a sense ‘there all along’; the difference is that during breakdown the presence-at-hand becomes foregrounded. Indeed when ‘present-at-hand’ is first introduced Heidegger appears to be using it as a binary distinction between Dasein, (human) entities that exist and ponder their existence, and other entities such as a table, rock or tree (p. 42/67).  The contrast is not so much between ready-to-hand and present-to-hand, but between ready-to-hand and ‘just present-at-hand’ (p.71/101) or ‘Being-just-present-at-hand-and-no-more’ (p.73/103). For Heidegger to seems not so much that ‘ready-to-hand’ stands in in opposition to ‘present-to-hand’; it is just more significant.

To put this in context, traditional philosophy had focused exclusively on the more categorically defined aspects of things as they are in the world (‘existentia’/present-at-hand), whilst ignoring the primary way they are encountered by us (Dasein, real knowing existence) as ready-to-hand, invisible in their purposefulness.  Heidegger seeks to redress this.

“If we look at Things just ‘theoretically’, we can get along without understanding readiness-to-hand.” (B&T p.69/98)

Heidegger wants to avoid the speculation of previous science and philosophy. Although it is not a Heidegger word, I use ‘speculation’ here with all of its connotations, pondering at a distance, but without commitment, or like spectators at a sports stadium looking in at something distant and other.  In contrast, ready-to-hand suggests commitment, being actively ‘in the world’ and even when Heidegger talks about those moments when an entity ceases to be ready-to-hand and is seen as present-to-hand, he uses the term circumspection — a casting of the eye around, so that the Dasein, the person, is in the centre.

So present-at-hand is simply the mode of being of the entities that are not Dasein (aware of their own existence), but our primary mode of experience of them and thus in a sense the essence of their real existence is when they are ready-to-hand.  I note Roderick Munday’s useful “Glossary of Terms in Being and Time” highlights just this broader sense of present-at-hand.

Maybe the confusion arises because Heidegger’s concern is phenomenological and so when an artefact is ready-to-hand and its presence-to-hand ‘withdraws’, in a sense it is no longer present-to-hand as this is no longer a phenomenon; and yet he also seems to hold a foot in realism and so in another sense it is still present-to-hand.  In discussing this tension between realism and idealism in Heidegger, Stepanich6 distinguishes present-at-hand and ready-to-hand, from presence-to-hand and readiness-to-hand — however no-one else does this so maybe that is a little too subtle!

To end this section (almost) with Heidegger’s words, a key statement, often quoted, seems to say precisely what I have argued above, or maybe precisely the opposite:

“Yet only by reason of something present-at-hand ‘is there’ anything ready-to-hand.  Does it follow, however, granting this thesis for the nonce, that readiness-to-hand is ontologically founded upon presence-at-hand?” (B&T, p.71/101)

What sort of philosopher makes a key point through a rhetorical question?

So, for TouchIT, maybe my safest course is to follow the example of the Oxford Companion to Philosophy, which describes ready-to-hand, but circumspectly never mentions present-to-hand at all?

and anyway what’s wrong with …

On a last note there is another confusion, or maybe mistaken attitude, that seems to be common when referring to ready-to-hand.  Heidegger’s concern was in ontology, understanding the nature of being, and so he asserted the ontological primacy of the ready-to-hand, especially in light of the previous dominant concerns of philosophy.  However, in HCI, where we are interested not in the philosophical question, but the pragmatic one of utility, usability, and experience, Heidegger is often misapplied as a kind of fetishism of engagement, as if everything should be ready-to-hand all the time.

Of course for many purposes this is correct, as I type I do not want to be aware of the keys I press, not even of the pages of the book that I turn.

Yet there is also a merit in breaking this engagement, to encourage reflection and indeed the circumspection that Heidegger discusses.  Indeed Gaver et al.’s focus on ambiguity in design7 is often just to encourage that reflection and questioning, bringing things to the foreground that were once background.

Furthermore as HCI practitioners and academics we need to both take seriously the ready-to-hand-ness of effective design, but also (just as Heidegger is doing) actually look at the ready-to-hand-ness of things seeing them and their use not taking them for granted.  I constantly strive to find ways to become aware of the mundane, and offer students tools for estrangement to look at the world askance8.

“To lay bare what is  just present-at-hand and no more, cognition must first penetrate beyond what is ready-to-hand in our concern.” (B&T, p.71/101)

This ability to step out and be aware of what we are doing is precisely the quality that Schon recognises as being critical for the ‘Reflective Practioner‘.  Indeed, my practical advice on using the hammer in the footnotes below comes precisely through reflection on hammering, and breakdowns in hammering, not through the times when the hammer was ready-to-hand..

Heidegger is indeed right that our primary existence is being in the world, not abstractly viewing it from afar.  And yet, sometimes also, just as Heidegger himself did as he pondered and wrote about these issues, one of our crowning glories as human beings is precisely that we are able also in a sense to step outside ourselves and look in wonder.

  1. In common with much of the literature the page references to Being and Time are all of the form p.70/99 where the first number refers to the page numbers in the original German (which I have not read!) and the second number to the page in Macquarrie and Robinson’s translation of Being and Time published by Blackwell.[back]
  2. Practical hammering – a few tips: The key thing is to focus on making sure the face of the hammer is perpendicular to the nail, if there is a slight angle the nail will bend.  For thin oval wire nails, if one does bend do not knock the nail back upright, most likely it will simply bend again and just snap.  Instead, simply hit the head of the nail while still bent, but keeping the hammer face perpendicular to the nail not the hole.  So long as the nail has cut any depth of hole it will simply follow its own path and straighten of its own accord.[back]
  3. James Gibson. The Ecological Approach to Visual Perception[back]
  4. Matt Webb’s post appears to be quoting Paul Dourish’ “Where the Action Is”, but I must have lent my copy to someone, so not sure of this is really what Paul thinks.[back]
  5. Koschmann, T., Kuutti, K. & Hickman, L. (1998). The Concept of Breakdown in Heidegger, Leont’ev, and Dewey and Its Implications for Education. Mind, Culture, and Activity, 5(1), 25-41. doi:10.1207/s15327884mca0501_3[back]
  6. Lambert Stepanich. “Heidegger: Between Idealism and Realism“, The Harvard Review of Philosophy, Vol 1. Spring 1991.[back]
  7. Bill Gaver, Jacob Beaver, and Steve Benford, 2003. Ambiguity as a resource for design. CHI ’03.[back]
  8. see previous posts on “mirrors and estrangement” and “the ordinary and the normal“[back]

Names, URIs and why the web discards 50 years of computing experience

Names and naming have always been a big issue both in computer science and philosophy, and a topic I have posted on before (see “names – a file by any other name“).

In computer science, and in particular programming languages, a whole vocabulary has arisen to talk about names: scope, binding, referential transparency. As in philosophy, it is typically the association between a name and its ‘meaning’ that is of interest. Names and words, whether in programming languages or day-to-day language, are, what philosophers call, ‘intentional‘: they refer to something else. In computer science the ‘something else’ is typically some data or code or a placeholder/variable containing data or code, and the key question of semantics or ‘meaning’ is about how to identify which variable, function or piece of data a name refers to in a particular context at a particular time.

The emphasis in computing has tended to be about:

(a) Making sure names have unambiguous meaning when looking locally inside code. Concerns such as referential transparency, avoiding dynamic binding and the deprecation of global variables are about this.

(b) Putting boundaries on where names can be seen/understood, both as a means to ensure (a) and also as part of encapsulation of semantics in object-based languages and abstract data types.

However, there has always been a tension between clarity of intention (in both the normal and philosophical sense) and abstraction/reuse. If names are totally unambiguous then it becomes impossible to say general things. Without a level of controlled ambiguity in language a legal statement such as “if a driver exceeds the speed limit they will be fined” would need to be stated separately for every citizen. Similarly in computing when we write:

function f(x) { return (x+1)*(x-1); }

The meaning of x is different when we use it in ‘f(2)’ or ‘f(3)’ and must be so to allow ‘f’ to be used generically. Crucially there is no internal ambiguity, the two ‘x’s refer to the same thing in a particular invocation of ‘f’, but the precise meaning of ‘x’ for each invocation is achieved by external binding (the argument list ‘(2)’).

Come the web and URLs and URIs.

Fiona@lovefibre was recently making a test copy of a website built using WordPress. In a pure html website, this is easy (so long as you have used relative or site-relative links within the site), you just copy the files and put them in the new location and they work 🙂 Occasionally a more dynamic site does need to know its global name (URL), for example if you want to send a link in an email, but this can usually be achieved using configuration file. For example, there is a development version of Snip!t at cardiff.snip!t.org (rather then www.snipit.org), and there is just one configuration file that needs to be changed between this test site and the live one.

Similarly in a pristine WordPress install there is just such a configuration file and one or two database entries. However, as soon as it has been used to create a site, the database content becomes filled with URLs. Some are in clear locations, but many are embedded within HTML fields or serialised plugin options. Copying and moving the database requires a series of SQL updates with string replacements matching the old site name and replacing it with the new — both tedious and needing extreme care not to corrupt the database in the process.

Is this just a case of WordPress being poorly engineered?

In fact I feel more a problem endemic in the web and driven largely by the URL.

Recently I was experimenting with Firefox extensions. Being a good 21st century programmer I simply found an existing extension that was roughly similar to what I was after and started to alter it. First of course I changed its name and then found I needed to make changes through pretty much every file in the extension as the knowledge of the extension name seemed to permeate to the lowest level of the code. To be fair XUL has mechanisms to achieve a level of encapsulation introducing local URIs through the ‘chrome:’ naming scheme and having been through the process once. I maybe understand a bit better how to design extensions to make them less reliant on the external name, and also which names need to be changed and which are more like the ‘x’ in the ‘f(x)’ example. However, despite this, the experience was so different to the levels of encapsulation I have learnt to take for granted in traditional programming.

Much of the trouble resides with the URL. Going back to the two issues of naming, the URL focuses strongly on (a) making the name unambiguous by having a single universal namespace;  URLs are a bit like saying “let’s not just refer to ‘Alan’, but ‘the person with UK National Insurance Number XXXX’ so we know precisely who we are talking about”. Of course this focus on uniqueness of naming has a consequential impact on generality and abstraction. There are many visitors on Tiree over the summer and maybe one day I meet one at the shop and then a few days later pass the same person out walking; I don’t need to know the persons NI number or URL in order to say it was the same person.

Back to Snip!t, over the summer I spent some time working on the XML-based extension mechanism. As soon as these became even slightly complex I found URLs sneaking in, just like the WordPress database 🙁 The use of namespaces in the XML file can reduce this by at least limiting full URLs to the XML header, but, still, embedded in every XML file are un-abstracted references … and my pride in keeping the test site and live site near identical was severely dented1.

In the years when the web was coming into being the Hypertext community had been reflecting on more than 30 years of practical experience, embodied particularly in the Dexter Model2. The Dexter model and some systems, such as Wendy Hall’s Microcosm3, incorporated external linkage; that is, the body of content had marked hot spots, but the association of these hot spots to other resources was in a separate external layer.

Sadly HTML opted for internal links in anchor and image tags in order to make html files self-contained, a pattern replicated across web technologies such as XML and RDF. At a practical level this is (i) why it is hard to have a single anchor link to multiple things, as was common in early Hypertext systems such as Intermedia, and (ii), as Fiona found, a real pain for maintenance!

  1. I actually resolved this by a nasty ‘hack’ of having internal functions alias the full site name when encountered and treating them as if they refer to the test site — very cludgy![back]
  2. Halasz, F. and Schwartz, M. 1994. The Dexter hypertext reference model. Commun. ACM 37, 2 (Feb. 1994), 30-39. DOI= http://doi.acm.org/10.1145/175235.175237[back]
  3. Hall, W., Davis, H., and Hutchings, G. 1996 Rethinking Hypermedia: the Microcosm Approach. Kluwer Academic Publishers.[back]

Time Machine – when it goes wrong and how to fix it

Unfortunately only fixing Mac OS X backup, not the Tardis 🙁 … but, nonetheless, critical.

What bit of software do you really need to be reliable?  If anything else goes really wrong you have the backup — but if the backup fails you really are lost.

And Mac OS X Time Machine, while it does have a very pretty  interface, is inclined to get stuck sometimes.

This is my own story of how it goes wrong … and how to put it right.

… and throughout I’ve dropped in a few lessons for anyone implementing critical system software — maybe the odd Apple engineer is reading

how to tell when things are wrong

Occasionally Time Machine seems to be stuck, but isn’t really.  When you first do a backup, or when you haven’t backed up to a particular disk for ages (perhaps if you have been away on a trip), it can spend several hours ‘preparing’.  You can tell it is ‘preparing’ because when you open the Time Machine preferences there is the little barbers pole saying ‘preparing’ 😉

This is when it is running over the disk working out what it needs to backup, and always seems to be the lengthiest operation, actually backing up the disk is often quite fast, and yet, for some reason there is no indication of how far through the ‘preparing’ process it has got.

Lesson 1: make sure you include progress indicators for anything that can take a while, not just the obvious ‘slow’ things.

So, when you see ‘preparing’, just be patient!

However, at least half-a-dozen times over the last year, my Time Machine has got completely stuck.  I have seen this happen in three ways:

(i)  it is still saying ‘preparing’ after leaving it overnight!

(ii)  it starts to transfer to disk, but then gets stuck part way:

(iii)  if you look in the Time Machine preferences it says the backup has failed

This last time in fact the first sign was (iii), but it doesn’t actually tell you (if you don’t look) until it has failed for ten days, by which time I was travelling.  In the days before Time Machine I always did a manual backup before travelling as I knew that was when things were most likely to go wrong, but now-a-days I have got used to relying on it and forget to check it is working OK … so if you are paranoid about your data, do peek occasionally at Time Machine to check it is still working!

When I got home and told Time Machine to backup to the Time Capsule here rather than my office disk (why can’t it remember that I have two backup disks??).  Then (after being very very patient while to was ‘preparing’ for four hours), I saw it got stuck in step (ii) at 1.4 GB or 4.2 GB.  Of course progress indicators are never very good for very slow operations, when transferring several GB of data there may be several minutes before the bar even moves a pixel … but I was very very patient and it definitely did not move!

Lesson 2: for very long processes supplement the progress indicator with some other indicator to show things are still working, in this case perhaps amount transferred in last minute

At this point I did the normal things, turn Time Machine On/Off,  restart machine a couple of times, etc., but when it persists then you know something is deeply wrong.

so why does it go wrong?

In fact Fiona@lovefibre has found Time Machine flawless for her desktop machine backing up to exactly the same Time Capsule.  I am guessing the problem I have is because I use a laptop so possible reasons:

  • it may go to sleep occasionally, breaking connection to the Time Capsule
  • maybe the WiFi aerial on a laptop is not as good as the desktop

However, if every laptop failed as often surely Apple would have fixed it by now.  So guessing there is an additional factor:

  • my disk has 196 Gb of data, much of it in smaller document files (word docs, code files, etc.), not just a few giant movies.

The software will be designed to withstand a certain amount of external failure, especially when connecting to disks over WiFi as the Time Capsule is designed to do.  However, I imagine that there are places in the code where there are race conditions, or critical portions where external failure really makes a difference.  If the external connections are reliable and the backup is quite fast the likelihood of hitting one of the nasty spots in the code is low.  However, if you have a lot of data to check and then transfer and the external failures more frequent, then the likelihood of hitting one increases and things start to go wrong.

I see similar problems with other software, Dreamweaver in particular, which has got better, but still can crash if the Internet connection is poor (see also “Why software need never hang“).  What happens is that during testing, the test machines often have minimal data, little software (maybe just the operating system and what is being tested), and operate in perfect situations.  In such circumstances these hidden flaws never become apparent.

Lesson 3: make sure your test machine is fully loaded with data and applications, and operates in an unreliable environment, so that testing is realistic

However, this is not like Word crashing and losing your most recent edits to one document.  When Time Machine fails it seems to occasionally leave something corrupt in the backup disk so that subsequent attempts to backup also fail.  There is no excuse for this, the techniques for dealing with potential disk-writing failures are well established in both databases and low-level disk management.  For example, one can save a timestamp file at the end of successful operations so that, when  returning to the data, if the timestamp file is not there the software knows something went wrong last time.

Maybe Time Machine is trying to be too clever, picking up where it left off when, for example, connection to the disk is broken.  If so it clearly needs some additional mechanism to notice “I’ve tried this several times and it keeps going wrong, maybe I need to back off to the last successful state”.  Perhaps not something to worry about in less critical software, but not difficult to get right when it is really needed … as in backups!

Lesson 4: build critical software defensively in layers so that errors in one part do not affect the whole; and if saving to disk ensure there is some sort of atomic transaction

The aim during testing should be what I call “fail-fast programming” trying to make sure that failures happen during testing not real use!

One thing I found particularly disturbing about my most recent Time Machine hang is that when I looked at the system console it had regular spats of “unknown SIGSEGV” several times a minute … in the kernel!  If you don’t know UNIX internals the ‘kernel’ is the heart of the operating system of the Mac, where all the lowest level work is done and where if something goes wrong everything fails.  SIGSEGV means that some bit of software is trying to access a memory location that doesn’t exist.  In fact while this is caught it is not so bad, the greater worry is that if it is trying to access non-existent memory, then it may corrupt other memory … and the kernel has access to everything – not good.

Please, please Apple if you cannot get Time Machine to work properly, do not let it affect the kernel!

how to put it right

One might hope that even if Time Machine cannot notice itself there is something wrong at least there would be an option to say “restart yourself”.  One might hope, but there is not.  However, you can do it yourself by digging a little into the backup disk itself.

First problem is to stop the Time Machine backup if it has hung.

In the Time Machine control panel, you can simply slide the OFF-ON button to OFF.  The status should change to ‘stopping’ and after a while stop.  Then you can restart the machine and try to fix things.

This is the ideal thing to do, but I find that when Time Machine is really hung this rarely works.  I do turn it to OFF, but either it never changes to ‘stopping’ and stays ‘preparing’, or it changes to ‘stopping’, but never does.  If this happens the system restart typically doesn’t restart the system as Time Machine won’t stop running.  Then, always with much trepidation, I reach for the on/off button on the Mac itself :-/

After doing a hard on/off like this, I usually do anther restart from the Apple menu … not sure if this is necessary, but just to be on the safe side!

Occasionally I skip to the next step before the hard restart.

Then you can start to fix the problem properly.

Find the backup disk.  If it is not obvious in the Finder use the ‘Go’ menu and select “Computer”; it shows all the locally connected disks (or it may simply appear in the left hand favourites pane in each Finder window).

If you skipped the restart stage (or of you just peek now to see what it is like when it hasn’t gone wrong), you will see something like “Backup of Alan Dix’s MacBook Pro” (obviously for you it will not be “Alan Dix’s MacBook Pro”!).  This is the Time Machine backup.  However, if you have restarted the machine with Time Machine off you will have to find the actual disk that you chose as your backup disk and on it look for a file called something like “Alan Dix’s MacBook  Pro_0039fc56f8a2.sparsebundle”.  This is some form of compressed disk image.  In the older versions of Time Machine there was simply a folder with all the backups in it — I felt much more secure.  Now this is a single opaque file and I worry that if one day it gets corrupted :-/

Having found the ‘sparsebundle’ double click it and it will display a little pop-up window that says ‘checking volumes’.  I keep meaning to see if this ever stops, but I am not patient enough and press the button that says to skip this state and then (after a while) it mounts the disk image and the disk “Backup of Alan Dix’s MacBook Pro” appears.

Double click “Backup of Alan Dix’s MacBook Pro” and look inside and then inside the folder “Backups_backupd” and you find loads of dated folders, which are the actual backups of your system that you can browse if you prefer instead of using the Time Machine interface.  In addition there may be one file ending “.inProgress”, which is some sort of internal file created while it is in the middle of doing the backup.

Delete the “.inProgress” file.

In addition, I usually delete the last of the dated folders (sort by “Date Modified” to get the last one).  However, if you don’t want to lose the last backup you can try just deleting the “inProgress” file and only delete the last dated backup if Time Machine still gets stuck.

Important: only delete the latest of the dated backup folders (e.g. “2010-06-09-225547” in the screen shot above), NOT the entire “Alan Dix’s Macbook Pro” folder.  If you do that you lose all your backups!

I recall doing this all with extreme trepidation the first time, but had got to the point when I couldn’t do backups or access them anyway so had nothing to lose.  Actually it seems pretty OK getting in here and doing this sort of thing, the nice thing about Time Machine is that it uses ordinary folder structures that you can peek around in and see are there all secure.  I am much happier with this than the kind of backup where you only know if it is working the day you try to restore something!  At least half the times I have used such backups over the years I’ve found the backup is in some way corrupt or incomplete. So actually one up for Time Machine 🙂

Now reboot again (for luck).  Turn Time Machine back on in the control panel and wait … a long time … it will start ‘preparing’ as if for the first backup … and several hours later hopefully all will be well.

But do remember to set the power save options not to go to sleep in the middle!

In fact the above has always worked for me except for this last time when, for some reason (maybe I missed something on the way?), it hung again and I had to go through the whole process again.  This time I waited until yesterday evening before turning Time Machine back on so that I could leave it to do the long 4 hour ‘preparing’ stage without me doing anything else.

And then:

Joy!

Phoenix rises – vfridge online again

vfridge is back!

I mentioned ‘Project Phoenix’ in my last previous post, and this was it – getting vfridge up and running again.

Ten years ago I was part of a dot.com company aQtive1 with Russell Beale, Andy Wood and others.  Just before it folded in the aftermath of the dot.com crash, aQtive spawned a small spin-off vfridge.com.  The virtual fridge was a social networking web site before the term existed, and while vfridge the company went the way of most dot.coms, for some time after I kept the vfridge web site running on Fiona’s servers until it gradually ‘decayed’ partly due to Javascript/DOM changes and partly due to Java’s interactions with mysql becoming unstable (note very, very old Java code!).  But it is now back online 🙂

The core idea of vfridge is placing small notes, photos and ‘magnets’ in a shareable web area that can be moved around and arranged like you might with notes held by magnets to a fridge door.

Underlying vfridge was what we called the websharer vision, which looked towards a web of user-generated content.  Now this is passé, but at the time  was directly counter to accepted wisdom and looking back seem prescient – remember this was written in 1999:

Although everyone isn’t a web developer, it is likely that soon everyone will become an Internet communicator — email, PC-voice-comms, bulletin boards, etc. For some this will be via a PC, for others using a web-phone, set-top box or Internet-enabled games console.

The web/Internet is not just a medium for publishing, but a potential shared place.

Everyone may be a web sharer — not a publisher of formal public ‘content’, but personal or semi-private sharing of informal ‘bits and pieces’ with family, friends, local community and virtual communities such as fan clubs.

This is not just a future for the cognoscenti, but for anyone who chats in the pub or wants to show granny in Scunthorpe the baby’s first photos.

Just over a year ago I thought it would be good to write a retrospective about vfridge in the light of the social networking revolution.  We did a poster “Designing a virtual fridge” about vfridge years ago at a Computers and Fun workshop, but have never written at length abut its design and development.  In particular it would be good to analyse the reasons, technical, social and commercial, why it did not ‘take off’ the time.  However, it is hard to do write about it without good screen shots, and could I find any? (Although now I have)  So I thought it would be good to revive it and now you can try it out again. I started with a few days effort last year at Christmas and Easter time (leisure activity), but now over the last week have at last used the fact that I have half my time unpaid and so free for my own activities … and it is done 🙂

The original vfridge was implemented using Java Servlets, but I have rebuilt it in PHP.  While the original development took over a year (starting down in Coornwall while on holiday watching the solar eclipse), this re-build took about 10 days effort, although of course with no design decisions needed.  The reason it took so much development back then is one of the things I want to consider when I write the retrospective.

As far as possible the actual behaviour and design is exactly as it was back in 2000 … and yes it does feel clunky, with lots of refreshing (remember no AJAX or web2.0 in those days) and of course loads of frames!  In fact there is a little cleverness that allowed some client-end processing pre-AJAX2.    Also the new implementation uses the same templates as the original one, although the expansion engine had to be rewritten in PHP.  In fact this template engine was one of our most re-used bits of Java code, although now of course many alternatives.  Maybe I will return to a discussion of that in another post.

I have even resurrected the old mobile interface.  Yes there were WAP phones even in 2000, albeit with tiny green and black screens.  I still recall the excitement I felt the first time I entered a note on the phone and saw it appear on a web page 🙂  However, this was one place I had to extensively edit the page templates as nothing seems to process WML anymore, so the WML had to be converted to plain-text-ish HTML, as close as possible to those old phones!  Looks rather odd on the iPhone :-/

So, if you were one of those who had an account back in 2000 (Panos Markopoulos used it to share his baby photos 🙂 ), then everything is still there just as you left it!

If not, then you can register now and play.

  1. The old aQtive website is still viewable at aqtive.org, but don’t try to install onCue, it was developed in the days of Windows NT.[back]
  2. One trick used the fact that you can get Javascript to pre-load images.  When the front-end Javascript code wanted to send information back to the server it preloaded an image URL that was really just to activate a back-end script.  The frames  used a change-propagation system, so that only those frames that were dependent on particular user actions were refreshed.  All of this is preserved in the current system, peek at the Javascript on the pages.    Maybe I’ll write about the details of these another time.[back]

Italian conferences: PPD10, AVI2010 and Search Computing

I got back from trip to Rome and Milan last Tuesday, this included the PPD10 workshop that Aaron, Lucia, Sri and I had organised, and the AVI 2008 conference, both in University of Rome “La Sapienza”, and a day workshop on Search Computing at Milan Polytechnic.

PPD10

The PPD10 workshop on Coupled Display Visual Interfaces1 followed on from a previous event, PPD08 at AVI 2008 and also a workshop on “Designing And Evaluating Mobile Phone-Based Interaction With Public Displays” at CHI2008.  The linking of public and private displays is something I’ve been interested in for some years and it was exciting to see some of the kinds of scenarios discussed at Lancaster as potential futures some years ago now being implemented over a range of technologies.  Many of the key issues and problems proposed then are still to be resolved and new ones arising, but certainly it seems the technology is ‘coming of age’.  As well as much work filling in the space of interactions, there were also papers that pushed some of the existing dimensions/classifications, in particular, Rasmus Gude’s paper on “Digital Hospitality” stretched the public/private dimension by considering the appropriation of technology in the home by house guests.  The full proceedings are available at the PPD10 website.

AVI 2010

AVI is always a joy, and AVI 2010 no exception; a biennial, single-track conference with high-quality papers (20% accept rate this year), and always in lovely places in Italy with good food and good company!  I first went to AVI in 1996 when it was in Gubbio to give a keynote “Closing the Loop: modelling action, perception and information“, and have gone every time since — I always say that Stefano Levialdi is a bit like a drug pusher, the first experience for free and ever after you are hooked! The high spot this year was undoubtedly Hitomi Tsujita‘s “Complete fashion coordinator2, a system for using social networking to help choose clothes to wear — partly just fun with a wonderful video, but also a very thoughtful mix of physical and digital technology.


images from Complete Fashion Coordinator

The keynotes were all great, Daniel Keim gave a really lucid state of the art in Visual Analytics (more later) and Patrick Lynch a fresh view of visual understanding based on many years experience and highlighting particularly on some of the more immediate ‘gut’ reactions we have to interfaces.  Daniel Wigdor gave an almost blow-by-blow account of work at Microsoft on developing interaction methods for next-generation touch-based user interfaces.  His paper is a great methodological exemplar for researchers combining very practical considerations, more principled design space analysis and targeted experimentation.

Looking more at the detail of Daniel’s work at Microsoft, it is interesting that he has a harder job than Apple’s interaction developers.  While Apple can design the hardware and interaction together, MS as system providers need to deal with very diverse hardware, leading to a ‘least common denominator’ approach at the level of quite basic touch interactions.  For walk-up-and use systems such as Microsoft Surface in bar tables, this means that users have a consistent experience across devices.  However, I did wonder whether this approach which is basically the presentation/lexical level of Seeheim was best, or whether it would be better to settle at some higher-level primitives more at the Seeheim dialog level, thinking particularly of the way the iPhone turns pull down menus form web pages into spinning selectors.  For devices that people own it maybe that these more device specific variants of common logical interactions allow a richer user experience.

The complete AVI 2010 proceedings (in colour or B&W) can be found at the conference website.

The very last session of AVI was a panel I chaired on “Visual Analytics: people at the heart of data” with Daniel Keim, Margit Pohl, Bob Spence and Enrico Bertini (in the order they sat at the table!).  The panel was prompted largely because the EU VisMaster Coordinated Action is producing a roadmap document looking at future challenges for visual analytics research in Europe and elsewhere.  I had been worried that it could be a bit dead at 5pm on the last day of the conference, but it was a lively discussion … and Bob served well as the enthusiastic but also slightly sceptical outsider to VisMaster!

As I write this, there is still time (just, literally weeks!) for final input into the VisMaster roadmap and if you would like a draft I’ll be happy to send you a PDF and even happier if you give some feedback 🙂

Search Computing

I was invited to go to this one-day workshop and had the joy to travel up on the train from Rome with Stu Card and his daughter Gwyneth.

The search computing workshop was organised by the SeCo project. This is a large single-site project (around 25 people for 5 years) funded as one of the EU’s ‘IDEAS Advanced Grants’ supporting ‘investigation-driven frontier research’.  Really good to see the EU funding work at the bleeding edge as so many national and European projects end up being ‘safe’.

The term search computing was entirely new to me, although instantly brought several concepts to mind.  In fact the principle focus of SeCo is the bringing together of information in deep web resources including combining result rankings; in database terms a form of distributed join over heterogeneous data sources.

The work had many personal connections including work on concept classification using ODP data dating back to aQtive days as well as onCue itself and Snip!t.  It also has similarities with linked data in the semantic web word, however with crucial differences.  SeCo’s service approach uses meta-descriptions of the services to add semantics, whereas linked data in principle includes a degree of semantics in the RDF data.  Also the ‘join’ on services is on values and so uses a degree of run-time identity matching (Stu Card’s example was how to know that LA=’Los Angeles’), whereas linked data relies on URIs so (again in principle) matching has already been done during data preparation.  My feeling is that the linking of the two paradigms would be very powerful, and even for certain kinds of raw data, such as tables, external semantics seems sensible.

One of the real opportunities for both is to harness user interaction with data as an extra source of semantics.  For example, for the identity matching issue, if a user is linking two data sources and notices that ‘LA’ and ‘Los Angeles’ are not identified, this can be added as part of the interaction to serve the user’s own purposes at that time, but by so doing adding a special case that can be used for the benefit of future users.

While SeCo is predominantly focused on the search federation, the broader issue of using search as part of algorithmics is also fascinating.  Traditional algorithmics assumes that knowledge is basically in code or rules and is applied to data.  In contrast we are seeing the rise of web algorithmics where knowledge is garnered from vast volumes of data.  For example, Gianluca Demartini at the workshop mentioned that his group had used the Google suggest API to extend keywords and I’ve seen the same trick used previously3.  To some extent this is like classic techniques of information retrieval, but whereas IR is principally focused on a closed document set, here the document set is being used to establish knowledge that can be used elsewhere.  In work I’ve been involved with, both the concept classification and folksonomy mining with Alessio apply this same broad principle.

The slides from the workshop are appearing (but not all there yet!) at the workshop web page on the SeCo site.

  1. yes I know this doesn’t give ‘PPD’ this stands for “public and private displays”[back]
  2. Hitomi Tsujita, Koji Tsukada, Keisuke Kambara, Itiro Siio, Complete Fashion Coordinator: A support system for capturing and selecting daily clothes with social network, Proceedings of the Working Conference on Advanced Visual Interfaces (AVI2010), pp.127–132.[back]
  3. The Yahoo! Related Suggestions API offers a similar service.[back]