changing rules of copyright on the web – the NLA case

I’ve been wondering about the broader copyright implications of a case that went through the England and Wales Court of Appeal earlier this year.  The case was brought by  the NLA (Newspaper Licensing Agency) against Meltwater, who run commercial media-alert services; for example telling  you or your company when and where you have been mentioned in the press.

While the case is specifically about a news service, it appears to have  broader implications for the web, not least because it makes new judgements on:

  • the use of titles/headlines — they are copyright in their own right
  • the use of short snippets (in this case no more than 256 characters) — they too potentially infringe copyright
  • whether a URL link is sufficient acknowledgement of copyright material for fair use – it isn’t!

These, particularly the last, seems to have implications for any form of publicly available lists, bookmarks, summaries, or even search results on the web.  While NLA specifically allow free services such as Google News and Google Alerts, it appears that this is ‘grace and favour’, not use by right.   I am reminded of the Shetland case1, which led to many organisations having paranoid policies regarding external linking (e.g. seeking explicit permission for every link!).

So, in the UK at least, web law copyright law changed significantly through precedent, and I didn’t even notice at the time!

In fact, the original case was heard more than a year ago November 2010 (full judgement) and then the appeal in July 2011 (full judgement), but is sufficiently important that the NLA are still headlining it on their home page (see below, and also their press releases (PDF) about the original judgement and appeal).  So effectively things changed at least at that point, although as this is a judgement about law, not new legislation, it presumably also acts retrospectively.  However, I only recently became aware of it after seeing a notice in The Times last week – I guess because it is time for annual licences to be renewed.

Newspaper Licensing Agency (home page) on 26th Dec 2011

The actual case was, in summary, as follows. Meltwater News produce commercial media monitoring services, that include the title, first few words, and a short snippet of  news items that satisfy some criteria, for example mentioning a company name or product.  NLA have a license agreement for such companies and for those using such services, but Meltwater claimed it did not need such a license and, even if it did, its clients certainly did not require any licence.  However, the original judgement and the appeal found pretty overwhelmingly in favour of NLA.

In fact, my gut feeling in this case was with the NLA.  Meltwater were making substantial money from a service that (a) depends on the presence of news services and (b) would, for equivalent print services, require some form of licence fee to be paid.  So while I actually feel the judgement is fair in the particular case, it makes decisions that seem worrying when looked at in terms of the web in general.

Summary of the judgement

The appeal supported the original judgement so summarising the main points from the latter (indented text quoting from the text of the judgement).

Headlines

The status of headlines (and I guess by extension book titles, etc.) in UK law are certainly materially changed by this ruling (para 70/71), from previous case law (Fairfax, Para. 62).

Para. 70. The evidence in the present case (incidentally much fuller than that before Bennett J in Fairfax -see her observations at [28]) is that headlines involve considerable skill in devising and they are specifically designed to entice by informing the reader of the content of the article in an entertaining manner.

Para. 71. In my opinion headlines are capable of being literary works, whether independently or as part of the articles to which they relate. Some of the headlines in the Daily Mail with which I have been provided are certainly independent literary works within the Infopaq test. However, I am unable to rule in the abstract, particularly as I do not know the precise process that went into creating any of them. I accept Mr Howe’s submission that it is not the completed work as published but the process of creation and the identification of the skill and labour that has gone into it which falls to be assessed.

Links and fair use

The ruling explicitly says that a link is not sufficient acknowledgement in terms of fair use:

Para. 146. I do not accept that argument either. The Link directs the End User to the original article. It is no better an acknowledgment than a citation of the title of a book coupled with an indication of where the book may be found, because unless the End User decides to go to the book, he will not be able to identify the author. This interpretation of identification of the author for the purposes of the definition of “sufficient acknowledgment” renders the requirement to identify the author virtually otiose.

Links as copies

Para 45 (not part of the judgement, but part of NLA’s case) says:

Para. 45. … By clicking on a Link to an article, the End User will make a copy of the article within the meaning of s. 17 and will be in possession of an infringing copy in the course of business within the meaning of s. 23.

The argument here is that the site has some terms and conditions that say it is not for ‘commercial user’.

As far as I can see the judge equivocates on this issue, but happily does not seem convinced:

Para 100. I was taken to no authority as to the effect of incorporation of terms and conditions through small type, as to implied licences, as to what is commercial user for the purposes of the terms and conditions or as to how such factors impact on whether direct access to the Publishers’ websites creates infringing copies. As I understand it, I am being asked to take a broad brush approach to the deployment of the websites by the Publishers and the use by End Users. There is undoubtedly however a tension between (i) complaining that Meltwater’s services result in a small click-through rate (ii) complaining that a direct click to the article skips the home page which contains the link to the terms and conditions and (iii) asserting that the End Users are commercial users who are not permitted to use the websites anyway.

Free use

Finally, the following extract suggests that NLA would not be seeking to enforce the full licence on certain free services:

Para. 20. The Publishers have arrangements or understandings with certain free media monitoring services such as Google News and Google Alerts whereby those services are currently licensed or otherwise permitted. It would apparently be open to the End Users to use such free services, or indeed a general search engine, instead of a paid media monitoring service without (currently at any rate) encountering opposition from the Publishers. That is so even though the End Users may be using such services for their own commercial purposes. The WEUL only applies to customers of a commercial media monitoring service.

Of course, the fact that they allow it without licence, suggests they feel the same copyright rules do apply, that is the search collation services are subject to copyright.  The judge does not make a big point of this piece of evidence in any way, which would suggest that these free services do not have a right to abstract and link.  However, the fact that Meltwater (the agency NA is acting against) is making substantial money was clearly noted by the judge, as was the fact that users could choose to use alternative services free.

Thinking about it

As noted my gut feeling is that fairness goes to the newspapers involved; news gathering and reportingis costly, and openly accessible online newspapers are of benefit to us all; so, if news providers are unable to make money, we all lose.

Indeed, years ago in dot.com days, at aQtive we were very careful that onCue, our intelligent internet sidebar, did not break the business models of the services we pointed to. While we effectively pre-filled forms and submitted them silently, we did not scrape results and present these directly, but instead sent the user to the web page that provided the information.  This was partly out a feeling that this was the right and fair thing to do, partly because if we treated others fairly they would be happy for us to provide this value-added service on top of what they provided, and partly because we relied on these third-party services for our business, so our commercial success relied on theirs.

This would all apply equally to the NLA v. Meltwater case.

However, like the Shetland case all those years ago, it is not the particular of the case that seems significant, but the wide ranging implications.  I, like so many others, frequently cite web materials in blog posts, web pages and resource lists by title alone with the words live and pointing to the source site.  According to this judgement the title is copyright, and even if my use of it is “fair use” (as it normally would be), the use of the live link is NOT sufficient acknowledgement.

Maybe, things are not quite so bad as they seem. In the NLA vs. Meltwater case, the NLA had a specific licence model and agreement.  The NLA were not seeking retrospective damages for copyright infringement before this was in place, merely requiring that Meltwater subscribe fully to the licence.  The issue was not that just that copyright had been infringed, but that it had been when there was a specific commercial option in place.  In UK copyright law, I believe, it is not sufficient to say copyright has been infringed, but also to show that the copyright owner has been materially disadvantaged by the infringement; so, the existence of the licence option was probably critical to the specific judgement.   However the general principles probably apply to any case where the owner could claim damage … and maybe claim so merely in order to seek an out-of-court settlement.

This case was resolved five months ago, and I’ve not heard of any rush of law firms creating vexatious copyright claims.  So maybe there will not be any long-lasting major repercussions from the case … or maybe the storm is still to come.

Certainly, the courts have become far more internet savvy since the 1990s, but judges can only deal with the laws they are give, and it is not at all clear that law-makers really understand the implications of their legislation on the smooth running of the web.

  1. This was the case in the late 1990s where the Shetland Times sued the Shetland News for including links to its articles.  Although the particular case involved material that appeared to be re-badged, the legal issues endangered the very act of linking at all. See NUJ Freelance “NUJ still supports Shetland News in internet case“, BBC “Shetland Internet squabble settled out of court“, The Lawyer “Shetland Internet copyright case is settled out of court“[back]

Six weeks on the road

I’ve been at home for the last week after six weeks travelling around the UK and elsewhere.  I’ve not kept up while on the road so doing a retrospective post on it all and need to try to catch on other half written posts.

As well as time at Talis offices in B’ham and at Lancs (including exam board week), travels have taken me to Pisa for a workshop on ‘Supportive User Interfaces’, to Koblenz for Web Science conference giving a talk on embodiment issues and a poster on web-scale reasoning , to Newcastle for British HCI conference doing a talk on fridge, to Nottingham to give a talk on extended episodic experience, and back to Lancs for a session on creativity! Why can’t I be like sensible folks and talk on one topic!

Supportive User Interfaces

Monday 13th June I attended a workshop in Pisa on “Supportive User Interfaces“, which includes interfaces that adapt in various ways to users.  The majority of people there were involved in various forms of model-based user interfaces in which various models of the task, application and interaction are used to generate user interfaces on the fly. W3C have had a previous group in this area; Dave Raggett from w3c was at the workshop and it sounds like there will be a new working group soon.  This clearly has strong links to various forms of ‘meta-level’ representations of data, tasks, etc..  My own contribution started the day, framing the area, focusing partly on reasons for having more ‘meta-level’ interfaces including social empowerment, and partly on the principles/techniques that need to be considered at a human level.

Also on Monday was a meeting of IFIP Working Group 2.7/13.4. IFIP is the UNESCO founded pan-national agency that national computer societies such as as the BCS in the UK and ACM and IEEE Computer in the US belong to.  Working Group 2.7/13.4 is focused on the engineering of user interfaces.  I had been actively involved in the past, but have had many years’ lapse.  However, this seemed a good thing to re-engage with with my new Talis hat on!

SUI: paper:

Web Science Conference in Koblenz

Jaime Teevan from Microsoft gave the opening keynote at WebSci 2011.  I know her from her earlier work on personal information management, but her recent work and keynote was about work on analysing and visualising changes in web pages.  Web page changes are also analysed alongside users re-visitation patterns; by looking at the frequency of re-visitation Jaime and her colleagues are able to identify the parts of pages that change with similar frequency, helping them, inter alia, to improve search ranking.

Had many great conversations, some with people I know previously (e.g. the Southampton folks), but also new, including the group at Troy that do lots of work with data.gov.  I was particularly interested in some work using content matching to look for links between otherwise unlinked (or only partly inter-linked) datasets.  Also lots of good presentations including one on trust prediction and a fantastic talk by Mark Bernstein from Eastgate, which he delivered in blank verse!

My own contribution included the poster that Dave@Talis prepared, which was on the web-scale spreading activation work in collaboration with Univ. Athens.  Quite a niche area in a multi-disciplinary conference, so didn’t elicit quite the interest of the social networking posters, but did lead to a small number of in depth discussions.

In addition I gave talk on the more cognitive/philosophical issues when we start to use the web as an external extension to / replacement of memory, including its impact on education.  Got some good feedback from this.

Closing keynote was from Barry Wellman, the guy who started social network analysis way before they were on computers.  At one point he challenged the Dunbar number1. I wondered whether this was due to cognitive extension with address books etc., but he didn’t seem to think so; there is evidence that some large circles predate web (although maybe not physical address books).  Made me wonder about itinerant tradesmen, tinkers, etc., even with no prostheses. Maybe the numbers sort of apply to any single content, but are repeated for each new context?

WebSci papers:

The HCI Conference – Newcastle

I attended the British HCI conference in Newcastle. This was the 25th conference, and as my very first academic paper in computing2 was at the first BHCI in 1984, I was pleased to be there at this anniversary.  The paper I was presenting was a retrospective on vfridge, a social networking site dating back to 1999/2000, it seemed an historic occasion!

As is always the case presentations were all interesting. Strictly BHCI is a ‘second tier’ conference compared with CHI, but why is it that the papers are always more interesting, that I learn more?  It is likely that a fair number of papers were CHI rejects, so it should be the other way round – is it that selectivity and ‘quality’ inevitably become conservative and boring?

Gregory Abowd gave the closing keynote. It was great to see Gregory again, we meet too rarely.  The main focus of his keynote was on three aspects of research: novelty, value and reliability and how his own work had moved within this space over the years.  In particular having two autistic sons has led him in directions he would never have considered, and this immediately valuable work has also created highly novel research. Novelty and value can coexist.

Gregory also reflected on the BHCI conference as it was his early academic ‘home’ when he did his PhD and postdoctoral here in the late 1980’s.  He thought that it could be rather than, as with many conferences, a second best to getting a CHI paper, instead a place for (not getting the quote quite perfect) “papers that should get into CHI”, by which he meant a proving ground for new ideas that would then go on to be in CHI.

Alan at conference dinnerHowever I initially read the quote differently. BHCI always had a broader concept of HCI compare with CHI’s quite limited scope. That is BHCI as a place that points the way for the future of HCI, just as it was the early nurturing place of MobileHCI.  However CHI has now become much broader in it’s own conception, so maybe this is no longer necessary. Indeed at the althci session the organisers said that their only complaint was that the papers were not ‘alt’ enough – that maybe ‘alt’ had become mainstream. This prompted Russell Beale to suggest that maybe althci should now be real science such as replication!

Gregory also noted the power of the conference as a meeting ground. It has always been proud of the breadth of international attendance, but perhaps it is UK saturation that should be it’s real measure of success.  Of course the conference agenda has become so full and international travel so much cheaper than it was, so there is a tendency to  go to the more topic specific international conferences and neglect the UK scene.  This is compounded by the relative dearth of small UK day workshops that used to be so useful in nurturing new researchers.

Tom at conference dinnerI feel a little guilty here as this was the first BHCI I had been to since it was in Lancaster in 2007 … as Tom McEwan pointed out I always apologise but never come! However, to be fair I have also only been twice to CHI in the last 10 years, and then when it was in Vienna and Florence. I have just felt too busy, so avoiding conferences that I did not absolutely have to attend.

In response to Gregory’s comments, someone, maybe Tom, mentioned that in days of metrics-based research assessment there was a tendency to submit one’s best work to those venues likely to achieve highest impact, hence the draw of CHI. However, I have hardly ever published in CHI and I think only once in TOCHI, yet, according to Microsoft Research, I am currently the most highly cited HCI researcher over the last 5 years … So you don’t have to publish in CHI to get impact!

And incidentally, the vfridge paper had NOT been submitted to CHI, but was specially written for BHCI as it seemed the fitting place to discuss a thoroughly British product 🙂

vfridge paper:

Nottingham MRL

I was at Mixed Reality Lab in Nottingham for Joel Fischer‘s PhD viva and while there did a seminar the afternoon on “extended episodic experience” based on Haliyana Khalid‘s PhD work and ideas that arose from it. Basically, whereas ‘user experience’ has become a big issue most of the work is focused on individual ‘experiences’ whereas much of life consists of ongoing series of experiences (episodes) which together make up the whole experience of interacting with a person or place, following a band, etc.

I had obviously not done a good enough job at wearing Joel down with difficult questions in the PhD viva in the morning as he was there in the afternoon to ask difficult questions back of his own 😉

Docfest – Digital Economy Summer School

The last major event was Docfest, which brought together the PhD students from the digital economy centres from around the country. Not sure of the exact count but just short of 150 participants I think. They come from a wide variety of backgrounds, business, design, computing, engineering, and many are mature students with years of professional experience behind them.

This looked like being a super event, unfortunately I was only able to attend for a day 🙁  However, I had a great evening at the welcome event talking with many of the students and even got to ride in Steve Forshaw‘s Sinclair C5!

My contribution to the event was running the first morning session on ‘creativity’. Surprise, surprise this started with a bad ideas session, but new for me too as the largest group I’ve run in the past has been around 30.  There were a number of local Highwire students acting as facilitators for the groups, so I had only to set them off and observe results :-). At the end of the morning I gave some the theoretical background to bad ideas as a method and in understanding (aspects of) creativity more widely.

Other speakers at the event included Jane Prophet, Chris Csikszentmihalyi and Chris Bonnington, so was sad to miss them; although I did get a fascinating chat with Jane over breakfast in the hotel hearing about her new projects on arts and neural imaging, and on how repetitious writing induces temporary psychosis … That is why the teachers give lines, to send the pupils bonkers!

  1. The idea that there are fundamental cognitive limits on social groups with different sized circles family~6, extended family~20, village~60, large village~200[back]
  2. I had published previously in agricultural engineering.[back]

the real tragedy of the commons

I’ve just been reviewing a paper that mentions the “tragedy of the commons”1  and whenever I read or hear the phrase I feel the hackles on the back of my neck rise.

Of course the real tragedy of the commons was not free-riding and depletion by common use, but the rape of the land under mass eviction or enclosure movements when they ceased to be commons.  The real tragedy of “the tragedy of the commons” as a catch phrase is that it is often used to promote the very same practices of centralisation.  Where common land has survived today, just as in the time before enclosures and clearances, it is still managed in a collaborative way both for the people now and the for the sake of future generations.  Indeed on Tiree, where I live, there are large tracts of common grazing land managed in just such a way.

It is good to see that the Wikipedia article of “Tragedy of the Commons” does give a rounded view on the topic including reference to an historical and political critique by “Ian Angus”2

The paper I was reading was not alone in uncritically using the phrase.  Indeed in “A Framework for Web Science”3 we read:

In a decentralised and growing Web, where there are no “owners” as such, can we be sure that decisions that make sense for an individual do not damage the interests of users as a whole? Such a situation, known as the ‘tragedy of the commons’, happens in many social systems that eschew property rights and centralised institutions once the number of users becomes too large to coordinate using peer pressure and moral principles.

In fact I do have some sympathy with this as the web involves a vast number of physically dispersed users who are perhaps “too large to coordinate using peer pressure and moral principles”.  However, what is strange is that the web has raised so many modern counter examples to the tragedy of the commons, not least Wikipedia itself.  In many open source projects people work as effectively a form of gift economy, where, if there is any reward, it is in the form of community or individual respect.

Clearly, there are examples in the world today where many individual decisions (often for short term gain) lead to larger scale collective loss.  This is most clearly evident in the environment, but also the recent banking crisis, which was fuelled by the desire for large mortgages and general debt-led lives.  However, these are exactly the opposite of the values surrounding traditional common goods.

It may be that the problem is not so much that large numbers of people dilute social and moral pressure, but that the impact of our actions becomes too diffuse to be able to appreciate when we make our individual life choices.  The counter-culture of many parts of the web may reflect, in part, the way in which aspects of the web can make the impact of small individual actions more clear to the individual and more accountable to others.

  1. Garrett Hardin, “The Tragedy of the Commons”, Science, Vol. 162, No. 3859 (December 13, 1968), pp. 1243-1248. … and here is the danger of citation counting as a quality metric, I am citing it because I disagree with it![back]
  2. Ian Angus. The Myth of the Tragedy of the Commons. Socialist Voice, August 24, 2008[back]
  3. Berners-Lee, T., Hall, W., Hendler, J. A., O’Hara, K., Shadbolt, N. and Weitzner, D. J. (2006) A Framework for Web Science. Foundations and Trends in Web Science, 1 (1). pp. 1-130.  http://eprints.ecs.soton.ac.uk/13347/[back]

announcing Tiree Tech Wave!

Ever since I came to Tiree I’ve had a vision of bringing people here, to share some of the atmosphere and work together.  A few of you have come on research visits and we have had some really productive times.  Others have said they wished they could come sometime.

Well now is your chance …

Come to Tiree Tech Wave in March to make, talk and play at the wind-ripping edge of digital technology.

seascape

Every year Tiree hosts the Wave Classic, a key international wind surfing event.  Those of us at the edge of the digital wave do not risk cold seas and bodily injury, but there is something of the same thrill as we explore the limits of code, circuit boards and social computation.

iconsThe cutting edge of wind-surfing boards is now high technology, but typically made by artisan craftsfolk, themselves often surfers.  Similarly hardware platforms such as Arduino, mobile apps for iPhone and Android, and web mashups enabled by public APIs and linked data are all enabling a new maker culture, challenging the hegemony of global corporations.

artworkThe Western Celtic fringes were one of the oases of knowledge and learning during the ‘dark ages’.  There is something about the empty horizon that helped the hermit to focus on God and inspired a flowering of decorative book-making, even in the face of battering storms of winter and Viking attacks of summer; a starkness that gave scholars time to think in peace between danger-fraught travel to other centres of learning across Europe.

Nowadays regular Flybe flights and Calmac ferries reduce the risk of Viking attacks whilst travelling to the isles, broadband Internet and satellite TV invade the hermit cell, and double glazing and central heating mollify the elements.  Yet there is still a rawness that helps focus the mind, a slightly more tenuous connection to the global infrastructure that fosters a spirit of self-reliance and independence.

LEDsOver a long weekend 17 – 21 March (TBC), we plan what I hope will be a semi-regular event.  A time to step out, albeit momentarily, from a target-driven world, to experiment and play with hardware and software, to discuss the issues of our new digital maker culture, what we know and what we seek to understand, and above all to make things together.

This is all about technology and people: the physical device that sits in our hands, the data.gov.uk mashup that tells us about local crime, the new challenges to personal privacy and society and the nation state.

Bring your soldering iron, and Arduino boards, your laptop and API specs, your half-written theses and semi-formed ideas, your favourite book or even well-loved eReader (!).  The format will be informal, with lots of time to work hands-on together; however, there will be the opportunity for short talks/demos/how-to-do-it sessions.  Also, if there is demand, I’d  be happy to do some more semi-formal tutorial sessions and maybe others would too (Arduino making, linked data).

Currently we have no idea whether there will be three or three hundred people interested, but aiming for something like 15 – 30 participants.  We’ll keep costs down, probably around £70 for meeting rooms, lunches, etc. over the five days, but will confirm that and more details shortly.

Follow on Twitter at @tireetechwave and the website will be at tireetechwave.com. However, it is still ‘under development’, so don’t be surprised at the odd glich over the next couple of weeks as we sort out details.

If you are interested in coming or want to know more mail me or Graham Dean

Web Art/Science Camp — how web killed the hypertext star and other stories

Had a great day on Saturday at the at the Web Art/Science Camp (twitter: #webartsci , lanyrd: web-art-science-camp). It was the first event that I went to primarily with my Talis hat on and first Web Science event, so very pleased that Clare Hooper told me about it during the DESIRE Summer School.

The event started on Friday night with a lovely meal in the restaurant at the British Museum. The museum was partially closed in the evening, but in the open galleries Rosetta Stone, Elgin Marbles and a couple of enormous totem poles all very impressive. … and I notice the BM’s website when it describes the Parthenon Sculptures does not waste the opportunity to tell us why they should not be returned to Greece!

Treasury of Atreus

I was fascinated too by images of the “Treasury of Atreus” (which is actually a Greek tomb and also known as the Tomb of Agamemnon. The tomb has a corbelled arch (triangular stepped stones, as visible in the photo) in order to relieve load on the lintel. However, whilst the corbelled arch was an important technological innovation, the aesthetics of the time meant they covered up the triangular opening with thin slabs of fascia stone and made it look as though lintel was actually supporting the wall above — rather like modern concrete buildings with decorative classical columns.

how web killed the hypertext star

On Saturday, the camp proper started with Paul de Bra from TU/e giving a sort of retrospective on pre-web hypertext research and whether there is any need for hypertext research anymore. The talk brought out several of the issues that have worried me also for some time; so many of the lessons of the early hypertext lost in the web1.

For me one of the most significant issues is external linkage. HTML embeds links in the document using <a> anchor tags, so that only the links that the author has thought of can be present (and only one link per anchor). In contrast, mature pre-web hypertext systems, such as Microcosm2, specified links eternally to the document, so that third parties could add annotation and links. I had a few great chats about this with one of the Southampton Web Science DTC students; in particular, about whether Google or Wikipedia effectively provide all the external links one needs.

Paul’s brief history of hypertext started, predictably, with Vannevar Bush‘s  “As We May Think” and Memex; however he pointed out that Bush’s vision was based on associative connections (like the human mind) and trails (a form of narrative), not pairwise hypertext links. The latter reminded me of Nick Hammond’s bus tour metaphor for guided educational hypertext in the 1980s — occasionally since I have seen things a little like this, and indeed narrative was an issue that arose in different guises throughout the day.

While Bush’s trails are at least related to the links of later hypertext and the web, the idea of associative connections seem to have been virtually forgotten.  More recently in the web however, IR (information retrieval) based approaches for page suggestions like Alexa and content-based social networking have elements of associative linking as does the use of spreading activation in web contexts3

It was of course Nelson who coined the term hypertext, but Paul reminded us that Ted Nelson’s vision of hypertext in Xanadu is far richer than the current web.  As well as external linkage (and indeed more complex forms in his ZigZag structures, a form of faceted navigation.), Xanadu’s linking was often in the form of transclusions pieces of one document appearing, quoted, in another. Nelson was particularly keen on having only one copy of anything, hence the transclusion is not so much a copy as a reference to a portion. The idea of having exactly one copy seems a bit of computing obsession, and in non-technical writing it is common to have quotations that are in some way edited (elision, emphasis), but the core thing to me seems to be the fact that the target of a link as well as the source need not be the whole document, but some fragment.

Paul de Bra's keynote at Web Art/Science Camp (photo Clare Hooper)

Over a period 30 years hypertext developed and started to mature … until in the early 1990s came the web and so much of hypertext died with its birth … I guess a bit like the way Java all but stiltified programming languages. Paul had a lovely list of bad things about the web compared with (1990s) state of the art hypertext:

Key properties/limitations in the basic Web:

  1. uni-directional links between single nodes
  2. links are not objects (have no properties of their own)
  3. links are hardwired to their source anchor
  4. only pre-authored link destinations are possible
  5. monolithic browser
  6. static content, limited dynamic content through CGI
  7. links can break
  8. no transclusion of text, only of images

Note that 1, 3 and 4 are all connected with the way that HTML embeds links in pages rather than adopting some form of external linkage. However, 2 is also interesting; the fact that links are not ‘first class objects’. This has been preserved in the semantic web where an RDF triple is not itself easily referenced (except by complex ‘reification’) and so it is hard to add information about relationships such as provenance.

Of course, this same simplicity (or even that it was simplistic) that reduced the expressivity of HTML compared with earlier hypertext is also the reasons for its success compared with earlier more heavy weight and usually centralised solutions.

However, Paul went on to describe how many of the features that were lost have re-emerged in plugins, server enhancements (this made me think of systems such as zLinks, which start to add an element of external linkage). I wasn’t totally convinced as these features are still largely in research prototypes and not entered the mainstream, but it made a good end to the story!

demos and documentation

There was a demo session as well as some short demos as part of talks. Lots’s of interesting ideas. One that particularly caught my eye (although not incredibly webby) was Ana Nelson‘s documentation generator “dexy” (not to be confused with doxygen, another documentation generator). Dexy allows you to include code and output, including screen shots, in documentation (LaTeX, HTML, even Word if you work a little) and live updates the documentation as the code updates (at least updates the code and output, you need to change the words!). It seems to be both a test harness and multi-version documentation compiler all in one!

I recall that many years ago, while he was still at York, Harold Thimbleby was doing something a little similar when he was working on his C version of Knuth’s WEB literate programming system. Ana’s system is language neutral and takes advantage of recent developments, in particular the use of VMs to be able to test install scripts and to be sure to run code in a consistent environments. Also it can use browser automation for web docs — very cool 🙂

Relating back to Paul’s keynote this is exactly an example of Nelson’s transclusion — the code and outputs included in the document but still tied to their original source.

And on this same theme I demoed Snip!t as an example of both:

  1. attempting to bookmark parts of web pages, a form of transclusion
  2. using data detectors a form of external linkage

Another talk/demo also showed how Compendium could be used to annotate video (in the talk regarding fashion design) and build rationale around … yet another example of external linkage in action.

… and when looking after the event at some of Weigang Wang‘s work on collaborative hypermedia it was pleasing to see that it uses a theoretical framework for shared understanding in collaboratuve hypermedia that builds upon my own CSCW framework from the early 1990s 🙂

sessions: narrative, creativity and the absurd

Impossible to capture in a few words, but one session included different talks and discussion about the relation of narrative and various forms of web experiences — including a talk on the cognitive psychology of the Kafkaesque. Also discussion of creativity with Nathan live recording in IBIS!

what is web science

I guess inevitably in a new area there was some discussion about “what is web science” and even “is web science a discipline”. I recall similar discussions about the nature of HCI 25 years ago and not entirely resolved today … and, as an artist who was there reminded us, they still struggle with “what is art?”!

Whether or not there is a well defined discipline of ‘web science’, the web definitely throws up new issues for many disciplines including new challenges for computing in terms of scale, and new opportunities for the social sciences in terms of intrinsically documented social interactions. One of the themes that recurred to distinguish web science from simply web technology is the human element — joy to my ears of course as a HCI man, but I think maybe not the whole story.

Certainly the gathering of people from different backgrounds in a sort of disciplinary bohemia is exciting whether or not it has a definition.

  1. see also “Names, URIs and why the web discards 50 years of computing experience“[back]
  2. Wendy Hall, Hugh Davis and Gerard Hutchings, “Rethinking Hypermedia:: The Microcosm Approach, Springer, 1996.[back]
  3. Spreading activation is used by a number of people, some of my own work with others at Athens, Rome and Talis is reported in “Ontologies and the Brain: Using Spreading Activation through Ontologies to Support Personal Interaction” and “Spreading Activation Over Ontology-Based Resources: From Personal Context To Web Scale Reasoning“.[back]

UK internet far from ubiquitous

On the last page of the Guardian on Saturday (13th Oct) in a sort of ‘interesting numbers’ section, they say that:

“30% of the UK population have no internet access at home”

I couldn’t find the exact source of this, however, another  guardian article “UK internet audience rises by 1.9 million over last year” dated Wednesday 30 June 2010 has a similar figure.  This says that Internet use  has grown to 38.8 million. The National Statistics office say the overall UK population is 61,792,000 with 1/5 under 16, so call that 2 in 16 under 10 or around 8 million. That gives an overall population of a little under 54 million over 10 years old, that is still only 70% actually using the web at all.

My guess is that some of the people with internet at home do not use it, and some of the ones without home connections use it using other means (mobile, use at school, cyber cafe’s), but by both measures we are hardly a society where the web is as ubiquitous as one might have imagined.

wisdom of the crowds goes to court

Expert witnesses often testify in court cases whether on DNA evidence, IT security or blood splatter patterns.  However, in the days of Web 2.0 who is the ‘expert’ witness?  Would then true Web 2.0 court submit evidence to public comments, maybe, like the Viking Thing or Wild West lynch mob, a vote of the masses using Facebook ‘Like’ could determine guilt or innocence.

However, it will be a conventional judge, not the justice of social networks, who will adjudicate if the hoteliers threatening to sue TripAdvisor1 do indeed bring the case to court. When TripAdvisor seeks to defend its case, they will not rely on crowd-sourced legal opinions, but lawyers whose advice is trusted because they are trained, examined and experienced and who are held responsible for their advice.  What is at stake is precisely the fact that TripAdvisor’s own site has none of these characteristics.

This may well, like the Shetland newspaper case in the 1990s2, become a critical precedent for many crowd-sourced sites and so is something we should all be watching.

Unlike Wikipedia or legal advice itself, ‘expertise’ is not the key issue in the case of TripAdvisior: every hotel guest is in a way the best expert as to their own experience.  However, how is the reader to know that the reviews posted are really by disgruntled guests rather than business rivals?  In science we are expected to declare sources of research funding, so that the reader can make judgements on the reliability of evidence funded by the tobacco or oil industry or indeed the burgeoning renewables sector.  Those who flout these conventions and rules may expect their papers to be withdrawn and their careers to flounder.  Similarly if I make a defamatory public statement about friend, colleague or public figure, then not only can the reliability of my words be determined by my own reputation for trustworthiness, but if my words turn out to be knowingly or culpably false and damaging then I can be sued for libel.   In the case of TripAdvisor there are none of the checks and balances of science or the law and yet the impact on individual hoteliers can make or break their business.    Who is responsible for damage caused by any untrue or malicious reviews posted on the site: the anonymous ‘crowd’ or TripAdvisor?

Of course users of review sites are not stupid, they know (or do they) that anonymous reviews should be taken with a pinch of salt.  My guess is that a crucial aspect of the case may be the extent to which TripAdvisor itself appears to lend credence to the reviews it publishes.  Indeed every page of TripAdvisior is headed with their strap line “World’s most trusted travel advice™”.

At the top of the home page there is also the phrase “Find Hotels Travelers Trust” and further down, “Whether you prefer worldwide hotel chains or cozy boutique hotels, you’ll find real hotel reviews you can trust at TripAdvisor“.  The former arguably puts the issue of trust back to the reviewers, but the latter is definitely TripAdvisor asserting to the trustworthiness of the reviews.

I think if I were in TripAdvisor I would be worried!

Issues of trust and reliability, provenance and responsibility are also going to be an important strand of the work I’ll be part of myself  at Talis: how do we assess the authority of crowd-sourced material, how do we convey to users the level of reliability of the information they view, especially if it is ‘mashed’ from different sources, how do we track the provenance of information in order to be able to do this?   Talis is interested because as a major provider and facilitator of open data, the reliability of the information it and its clients provide is a crucial part of that information — unreliable information is not information!

However, these issues are critical for everyone involved in the new web; if those of us engaged in research and practice in IT do not address these key problems then the courts will.

  1. see The Independent, “Hoteliers to take their revenge on TripAdvisor’s critiques in court“, Saturday 11th Sept. 2010[back]
  2. The case around 1996/1997 involved the Shetland Times obtaining a copyright against ‘deep linking’ by the rival Shetland News, that is links directly to news stories bypassing the Shetland News home page.  This was widely reported at the time and became an important case in Internet law: see, for example, Nov 1996 BBC News story or netlitigation.com article.  The out of court settlement allowed the deep linking so long as the link was clearly acknowledged.  However, while the settlement was sensible, the uncertainty left by the case pervaded the industry for years, leading to some sites abandoning link pages, or only linking after obtaining explicit permissions, thus stifling the link-economy of the web. [back]

Names, URIs and why the web discards 50 years of computing experience

Names and naming have always been a big issue both in computer science and philosophy, and a topic I have posted on before (see “names – a file by any other name“).

In computer science, and in particular programming languages, a whole vocabulary has arisen to talk about names: scope, binding, referential transparency. As in philosophy, it is typically the association between a name and its ‘meaning’ that is of interest. Names and words, whether in programming languages or day-to-day language, are, what philosophers call, ‘intentional‘: they refer to something else. In computer science the ‘something else’ is typically some data or code or a placeholder/variable containing data or code, and the key question of semantics or ‘meaning’ is about how to identify which variable, function or piece of data a name refers to in a particular context at a particular time.

The emphasis in computing has tended to be about:

(a) Making sure names have unambiguous meaning when looking locally inside code. Concerns such as referential transparency, avoiding dynamic binding and the deprecation of global variables are about this.

(b) Putting boundaries on where names can be seen/understood, both as a means to ensure (a) and also as part of encapsulation of semantics in object-based languages and abstract data types.

However, there has always been a tension between clarity of intention (in both the normal and philosophical sense) and abstraction/reuse. If names are totally unambiguous then it becomes impossible to say general things. Without a level of controlled ambiguity in language a legal statement such as “if a driver exceeds the speed limit they will be fined” would need to be stated separately for every citizen. Similarly in computing when we write:

function f(x) { return (x+1)*(x-1); }

The meaning of x is different when we use it in ‘f(2)’ or ‘f(3)’ and must be so to allow ‘f’ to be used generically. Crucially there is no internal ambiguity, the two ‘x’s refer to the same thing in a particular invocation of ‘f’, but the precise meaning of ‘x’ for each invocation is achieved by external binding (the argument list ‘(2)’).

Come the web and URLs and URIs.

Fiona@lovefibre was recently making a test copy of a website built using WordPress. In a pure html website, this is easy (so long as you have used relative or site-relative links within the site), you just copy the files and put them in the new location and they work 🙂 Occasionally a more dynamic site does need to know its global name (URL), for example if you want to send a link in an email, but this can usually be achieved using configuration file. For example, there is a development version of Snip!t at cardiff.snip!t.org (rather then www.snipit.org), and there is just one configuration file that needs to be changed between this test site and the live one.

Similarly in a pristine WordPress install there is just such a configuration file and one or two database entries. However, as soon as it has been used to create a site, the database content becomes filled with URLs. Some are in clear locations, but many are embedded within HTML fields or serialised plugin options. Copying and moving the database requires a series of SQL updates with string replacements matching the old site name and replacing it with the new — both tedious and needing extreme care not to corrupt the database in the process.

Is this just a case of WordPress being poorly engineered?

In fact I feel more a problem endemic in the web and driven largely by the URL.

Recently I was experimenting with Firefox extensions. Being a good 21st century programmer I simply found an existing extension that was roughly similar to what I was after and started to alter it. First of course I changed its name and then found I needed to make changes through pretty much every file in the extension as the knowledge of the extension name seemed to permeate to the lowest level of the code. To be fair XUL has mechanisms to achieve a level of encapsulation introducing local URIs through the ‘chrome:’ naming scheme and having been through the process once. I maybe understand a bit better how to design extensions to make them less reliant on the external name, and also which names need to be changed and which are more like the ‘x’ in the ‘f(x)’ example. However, despite this, the experience was so different to the levels of encapsulation I have learnt to take for granted in traditional programming.

Much of the trouble resides with the URL. Going back to the two issues of naming, the URL focuses strongly on (a) making the name unambiguous by having a single universal namespace;  URLs are a bit like saying “let’s not just refer to ‘Alan’, but ‘the person with UK National Insurance Number XXXX’ so we know precisely who we are talking about”. Of course this focus on uniqueness of naming has a consequential impact on generality and abstraction. There are many visitors on Tiree over the summer and maybe one day I meet one at the shop and then a few days later pass the same person out walking; I don’t need to know the persons NI number or URL in order to say it was the same person.

Back to Snip!t, over the summer I spent some time working on the XML-based extension mechanism. As soon as these became even slightly complex I found URLs sneaking in, just like the WordPress database 🙁 The use of namespaces in the XML file can reduce this by at least limiting full URLs to the XML header, but, still, embedded in every XML file are un-abstracted references … and my pride in keeping the test site and live site near identical was severely dented1.

In the years when the web was coming into being the Hypertext community had been reflecting on more than 30 years of practical experience, embodied particularly in the Dexter Model2. The Dexter model and some systems, such as Wendy Hall’s Microcosm3, incorporated external linkage; that is, the body of content had marked hot spots, but the association of these hot spots to other resources was in a separate external layer.

Sadly HTML opted for internal links in anchor and image tags in order to make html files self-contained, a pattern replicated across web technologies such as XML and RDF. At a practical level this is (i) why it is hard to have a single anchor link to multiple things, as was common in early Hypertext systems such as Intermedia, and (ii), as Fiona found, a real pain for maintenance!

  1. I actually resolved this by a nasty ‘hack’ of having internal functions alias the full site name when encountered and treating them as if they refer to the test site — very cludgy![back]
  2. Halasz, F. and Schwartz, M. 1994. The Dexter hypertext reference model. Commun. ACM 37, 2 (Feb. 1994), 30-39. DOI= http://doi.acm.org/10.1145/175235.175237[back]
  3. Hall, W., Davis, H., and Hutchings, G. 1996 Rethinking Hypermedia: the Microcosm Approach. Kluwer Academic Publishers.[back]

Apache: pretty URLs and rewrite loops

[another techie post – a problem I had and can see that other people have had too]

It is common in various web frameworks to pass pretty much everything through a central script using Apache .htaccess file and mod_rewrite.  For example enabling permalinks in a WordPress blog generates an .htaccess file like this:

RewriteEngine On
RewriteBase /blog/
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /blog/index.php [L]

I use similar patterns for various sites such as vfridge (see recent post “Phoenix rises“) and Snip!t.  For Snip!t however I was using not a local .htaccess file, but an AliasMatch in httpd.conf, which meant I needed to ask Fiona every time I needed to do a change (as I can never remember the root passwords!).  It seemed easier (even if slightly less efficient) to move this to a local .htaccess file:

RewriteEngine On
RewriteBase /
RewriteRule ^(.*)$ code/top.php/$1 [L]

The intention is to map “/an/example?args” into “/code/top.php/an/example?args”.

Unfortunately this resulted in a “500 internal server error” page and in the Apache error log messages saying there were too many internal redirects.  This seems to be a common problem reported in forums (see here, here and here).  The reason for this is that .htaccess files are encountered very late in Apache’s processing and so anything rewritten by the rules gets thrown back into Apache’s processing almost as if they were a fresh request.  While the “[L]”(last)  flags says “don’t execute any more rules”, this means “no more rules on this pass”, but when Apache gets back to the .htaccess in the fresh round the rule gets encountered again and again leading to an infinite loop “/code/top/php/code/top.php/…/code/top.php/an/example?args”.

Happily, mod_rewrite thought of this and there is an additional “[NS]” (nosubreq) flag that says “only use this rule on the first pass”.  The mod_rewrite documentation for RewriteRule in Apache 1.3, 2.0 and 2.3 says:

Use the following rule for your decision: whenever you prefix some URLs with CGI-scripts to force them to be processed by the CGI-script, the chance is high that you will run into problems (or even overhead) on sub-requests. In these cases, use this flag.

I duly added the flag:

RewriteRule ^(.*)$ code/top.php/$1 [L,NS]

This should work, but doesn’t.  I’m not sure why except that the Apache 2.2 documentation for NS|nosubreq reads:

NS|nosubreq

Use of the [NS] flag prevents the rule from being used on subrequests. For example, a page which is included using an SSI (Server Side Include) is a subrequest, and you may want to avoid rewrites happening on those subrequests.

Images, javascript files, or css files, loaded as part of an HTML page, are not subrequests – the browser requests them as separate HTTP requests.

This is identical to the documentation for 1.3, 2.0 and 2.3 except that quote about “URLs with CGI-scripts” is singularly missing.  I can’t find anything that says so, but my guess is that there was some bug (feature?) introduced 2.2 that is being fixed in 2.3.

WordPress is immune from the infinite loop as the directive “RewriteCond %{REQUEST_FILENAME} !-f” says “if the file exists use that without rewriting”.  As “index.php” is a file, the rule does not rewrite a second time.  However, the layout of my files meant that I sometimes have an actual file in the pseudo location (e.g. /an/example really exists).  I could have reorganised the complete directory structure … but then I would have been still fixing all the broken links now!

Instead I simply added an explicit “please don’t rewrite my top.php script” condition:

RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_URI}  !^/code/top.php/.*
RewriteRule ^(.*)$ code/top.php/$1 [L,NS]

I suspect that this will be unnecessary when Apache upgrades to 2.3, but for now … it works 🙂