level of detail – scale matters

We get used to being able to zoom into every document picture and map, but part of the cartographer’s skill is putting the right information at the right level of detail.  If you took area maps and then scaled them down, they would not make a good road atlas, the main motorways would hardly be visible, and the rest would look like a spider had walked all over it.  Similarly if you zoom into a road atlas you would discover the narrow blue line of each motorway is in fact half a mile wide on the ground.

Nowadays we all use online maps that try to do this automatically.  Sometimes this works … and sometimes it doesn’t.

Here are three successive views of Google maps focused on Bournemouth on the south coast of England.

On the first view we see Bournemouth clearly marked, and on the next, zooming in a little Poole, Christchurch and some smaller places also appear.  So far, so good, as we zoom in more local names are shown as well as the larger place.

bournemouth-1  bournemouth-2

However, zoom in one more level and something weird happens, Bournemouth disappears.  Poole and Christchurch are there, but no  Bournemouth.

bournemouth-3

However, looking at the same level scale on another browser, Bournemouth is there still:

bournemouth-4

The difference between the two is the Hotel Miramar.  On the first browser I am logged into Google mail, and so Google ‘knows’ I am booked to stay in the Hotel Miramar (presumably by scanning my email), and decides to display this also.   The labels for Bournemouth and the hotel label overlap, so Google simply omitted the Bournemouth one as less important than the hotel I am due to stay in.

A human map maker would undoubtedly have simply shifted the name ‘Bournemouth’ up a bit, knowing that it refers to the whole town.  In principle, Google maps could do the same, but typically geocoding (e.g. Geonames) simply gives a point for each location rather than an area, so it is not easy for the software to make adjustments … except Google clearly knows it is ‘big’ as it is displayed on the first, zoomed out, view; so maybe it could have done better.

This problem of overlapping legends will be familiar to anyone involved in visualisation whether map based or more abstract.

cone-trees

The image above is the original Cone Tree hierarchy browser developed by Xerox PARC in the early 1990s1.  This was the early days of interactive 3D visualisation, and the Cone Tree exploited many of the advantages such as a larger effective ‘space’ to place objects, and shadows giving both depth perception, but also a level of overview.  However, there was no room for text labels without them all running over each other.

Enter the Cam Tree:

cam-tree

The Cam Tree is identical to the cone tree, except because it is on its side it is easier to place labels without them overlapping 🙂

Of course, with the Cam Tree the regularity of the layout makes it easy to have a single solution.  The problem with maps is that labels can appear anywhere.

This is an image of a particularly cluttered part of the Frasan mobile heritage app developed for the An Iodhlann archive on Tiree.  Multiple labels overlap making them unreadable.  I should note that the large number of names only appear when the map is zoomed in, but when they do appear, there are clearly too many.

frasan-overlap

It is far from clear how to deal with this best.  The Google solution was simply to not show some things, but as we’ve seen that can be confusing.

Another option would be to make the level of detail that appears depend not just on the zoom, but also the local density.  In the Frasan map the locations of artefacts are not shown when zoomed out and only appear when zoomed in; it would be possible for them to appear, at first, only in the less cluttered areas, and appear in more busy areas only when the map is zoomed in sufficiently for them to space out.   This would trade clutter for inconsistency, but might be worthwhile.  The bigger problem would be knowing whether there were more things to see.

Another solution is to group things in busy areas.  The two maps below are from house listing sites.  The first is Rightmove which uses a Google map in its map view.  Note how the house icons all overlap one another.  Of course, the nature of houses means that if you zoom in sufficiently they start to separate, but the initial view is very cluttered.  The second is daft.ie; note how some houses are shown individually, but when they get too close they are grouped together and just the number of houses in the group shown.

rightmove-houses  daft-ie-house-site

A few years ago, Geoff Ellis and I reviewed a number of clutter reduction techniques2, each with advantages and disadvantages, there is no single ‘best’ answer. The daft.ie grouping solution is for icons, which are fixed size and small, the text label layout problem is far harder!

Maybe someday these automatic tools will be able to cope with the full variety of layout problems that arise, but for the time being this is one area where human cartographers still know best.

  1. Robertson, G. G. ; Mackinlay, J. D. ; Card, S. K. Cone Trees: animated 3D visualizations of hierarchical informationProceedings of the ACM Conference on Human Factors in Computing Systems (CHI ’91); 1991 April 27 – May 2; New Orleans; LA. NY: ACM; 1991; 189-194.[back]
  2. Geoffrey Ellis and Alan Dix. 2007. A Taxonomy of Clutter Reduction for Information VisualisationIEEE Transactions on Visualization and Computer Graphics 13, 6 (November 2007), 1216-1223. DOI=10.1109/TVCG.2007.70535[back]

Walking Wales

As some of you already know, next year I will be walking all around Wales: from May to July covering just over 1000 miles in total.

Earlier this year the Welsh Government announced the opening of the Wales Coastal Path a new long distance footpath around the whole coast of Wales. There were several existing long distance paths covering parts of the coastline, as well as numerous stretches of public footpaths at or near the coast. However, these have now been linked, mapped and waymarked creating for the first time, a continuous single route. In addition, the existing Offa’s Dyke long distance path cuts very closely along the Welsh–English border, so that it is possible to make a complete circuit of Wales on the two paths combined.

As soon as I heard the announcement, I knew it was something I had to do, and gradually, as I discussed it with more and more people, the idea has become solid.

This will not be the first complete periplus along these paths; this summer there have been at least two sponsored walkers taking on the route. However, I will be doing the walk with a technology focus, which will, I believe, be unique.

The walk has four main aspects:

personal — I am Welsh, was born and brought up in Cardiff, but have not lived in Wales for over 30 years. The walk will be a form of homecoming, reconnecting with the land and its people that I have been away from for so long. The act of encircling can symbolically ‘encompass’ a thing, as if knowing the periphery one knows the whole. Of course life is not like this, the edge is just that, not the core, not the heart. As a long term ex-pat, a foreigner in my own land, maybe all I can hope to do is scratch the surface, nibble at the edges. However, also I always feel most comfortable as an outsider, as one at the margins, so in some ways I am going to the places where I most feel at home. I will blog, audio blog, tweet and generally share this experience to the extent the tenuous mobile signal allows, but also looking forward to periods of solitude between sea and mountain.

practical — As I walk I will be looking at the IT experience of the walker and also discuss with local communities the IT needs and problems for those at the edges, at the margins. Not least will be issues due to the paucity of network access both patchy mobile signal whilst walking and low-capacity ‘broadband’ at the limits of wind-beaten copper telephone wires — none of the mega-capacity fibre optic of the cities. This will not simply be fact-finding, but actively building prototypes and solutions, both myself (in evenings and ‘days off’) and with others who are part of the project remotely or joining me for legs of the journey1. Geolocation and mobile based applications will be a core part of this, particularly for the walkers experience, but local community needs likely to be far more diverse.

philosophical — Mixed with personal reflections will be an exploration of the meanings of place, of path, of walking, of nomadicity and of locality. Aristotle’s school of philosophy was called the Peripatetic School because discussion took place while walking; over two thousand years later Wordsworth’s poetry was nearly all composed while walking; and for time immemorial routes of pilgrimage have been a focus of both spiritual service and personal enlightenment. This will build on some of my own previous writings in particular past keynotes2 on human understanding of space, and also wider literature such as Rebecca Solnit’s wonderful “Wanderlust“.  This reflection will inform the personal blogging, and after I finish I will edit this into a book or account of the journey.

research3 — the practical outcomes will intersect with various personal research interests including social empowerment, interaction design and algorithmics4.  For the walker’s experience, I will be effectively doing a form of action research!.  This will certainly include how to incorporate local maps (such as tourists town plans) effectively into more large-scale experiences, how ‘crowdsourced’ route knowledge can augment more formal digital and paper resources, data synchonisation to deal with disconnection, and data integration between diverse sources.  In addition I am offering myself as a living lab so that others can use my trip as a place to try out their own sensors and instrumentation5, information systems, content authoring, ethnographic practices, community workshops, etc.  This may involve simply asking me to use things, coming for a single meeting or day, or joining me for parts of the walk.

If any of this interests you, do get in touch.  As well as research collaborations (living lab or supporting direct IT goals) any help in managing logistics, PR, or finding sources of funding/sponsorship for basic costs, most welcome.

I’ll get a dedicated website, Facebook page, twitter account, and charity sponsorship set up soon … watch this space!

  1. Coding whilst walking is something I have thought about (but not done!) for many years, but definitely inspired more recently by Nick the amazing cycling programmer who came to the Spring Tiree Tech Wave.[back]
  2. Welsh Mathematician Walks in Cyberspace“, and “Paths and Patches: patterns of geognosy and gnosis“.[back]
  3. I tried to think of a word beginning with ‘p’ for research, but failed![back]
  4. As I tagged this post I found I was using nearly all my my most common tags — I hadn’t realised quite how much this project cuts across so many areas of interest.[back]
  5. But with the “no blood rule”: if I get sensor sores, the sensors go in the bin 😉 [back]

Tiree going mobile

Tiree’s Historical Centre An Iodhlann has just been awarded funding by the Scottish Digital Research and Development Fund for Arts and Culture to make historic archive material available through a mobile application whilst ‘on the ground’ walking, cycling or driving around the island.

I’ve been involved in bigger projects, but I can’t recall being more excited than this one: I think partly because it brings together academic interests and local community.

the project

An Iodhlann (Gaelic for a stackyard) is the historical centre on the island of Tiree.  Tiree has a rich history from the Mesolithic period to the Second World war base. The archive was established in 1998, and its collection of old letters, emigrant lists, maps, photographs, stories and songs now extends to 12 000 items.  500 items are available online, but the rest of the primary data is only available at the centre itself.  A database of 3200 island place names collated by Dr Holliday, the chair of An Iodhlann, has recently been made available on the web at tireeplacenames.org.  Given the size of the island (~750 permanent residents) this is a remarkable asset.

          

To date, the online access at An Iodhlann is mainly targeted at archival / historical use, although the centre itself has a more visitor-centred exhibition.  However, the existing digital content has the potential to be used for a wider range of applications, particularly to enhance the island experience for visitors.

Over the next nine months we will create a mobile application allowing visitors and local historians to access geographically pertinent information, including old photographs, and interpretative maps/diagrams, while actually at sites of interest.  This will largely use visitors’ own devices such as smart phones and tablets.  Maps will be central to the application, using both OS OpenData and bespoke local maps and sketches of historical sites.

As well as adding an extra service for those who already visit An Iodhlann, we hope that this will attract new users, especially younger tourists.  In addition a ‘data layer’ using elements of semantic web technology will mean that the raw geo-coded information is available for third parties to mash-up and for digital humanities research.

the mouse that roars

The Scottish Digital Research and Development Fund for Arts and Culture is run by Nesta, Creative Scotland and the Arts and Humanities Research Council (AHRC).

This was a highly competitive process with 52 applications of which just 6 were funded.  The other successful organisations are: The National Piping Centre, Lyceum Theatre Company and the Edinburgh Cultural Quarter, Dundee Contemporary Arts, National Galleries of Scotland, Glasgow Film Theatre and Edinburgh Filmhouse.  These are all big city organisations as were the projects funded by an earlier similar programme run by Nesta England.

As the only rural-based project, this is a great achievement for Tiree and a great challenge for us over the next nine months!

challenges

In areas of denser population or high overall tourist numbers, historical or natural sites attract sufficient visitors to justify full time (volunteer or paid) staff.  In more remote rural locations or small islands there are neither sufficient people for volunteers to cover all, or even a significant number, of sites, nor have they sufficient tourist volume to justify commercial visitor centres.

A recent example of this on Tiree is the closing of the Thatched Cottage Museum.  This is one of the few remaining thatched houses on the island, and housed a collection of everyday historical artefacts.  This was owned by the Hebridean Trust, and staffed by local volunteers, but was recently closed and the building sold, as it proved difficult to keep it staffed sufficiently given the visitor numbers.

At some remote sites such as the Tiree chapels, dating back to the 10th century, or Iron Age hill forts, there are simple information boards and at a few locations there are also fixed indoor displays, including at An Iodhlann itself.  However, there are practical and aesthetic limits on the amount of large-scale external signage and limits on the ongoing running and maintenance of indoor exhibits.  Furthermore, limited mobile signals mean that any mobile-based solutions cannot assume continuous access.

from challenge to experience

Providing information on visitors’ own phones or tablets will address some of the problems of lack of signage and human guides.  However, achieving this without effective mobile coverage means that simple web-based solutions will not work.

The application used whilst on the ground will need to be downloaded, but then this limits the total amount of information that is available whilst mobile; our first app will be built using HTML5 to ensure it will be available on the widest range of mobile devices (iOS, Android, Windows Mobile, ordinary laptops), but using HTML5  further reduces the local storage available1.

In order to deal with this, the on-the-ground experience will be combined with a web site allowing pre-trip planning and post-trip reminiscence.  This will also be map focused, allowing visitors to see where they have been or are about to go, access additional resources, such as photos and audio files that are too large to be available when on the ground (remembering poor mobile coverage). This may also offer an opportunity to view social content including comments or photographs of previous visitors and then to associate one’s own photographs taken during the day with the different sites and create a personal diary, which can be shared with others.

On reflection, this focus on preparation and reminiscence will create a richer and more extended experience than simply providing information on demand.  Rather than reading reams of on-screen text whilst looking at a  monument or attempting to hear an audio recording in the Tiree wind, instead visitors will have some information available in the field and more when they return to their holiday base, or home2.

 

  1. For some reason HTML5 applications are restricted to a maximum of 5Mb![back]
  2. This is another example of a lesson I have seen so many times before: the power of constraints to force more innovative and better designs. So many times I have heard people say about their own designs “I wanted to make X, but couldn’t for some reason so did Y instead” and almost every time it is the latter, the resource-constrained design, that is clearly so much better.[back]

A month away brain engaged and blood on the floor

Writing at Glasgow airport waiting for flight home after nearly whole month away. I have had a really productive time first at Talis HQ and Lancs (all in the camper van!) and then visits to Southampton (experience design and semantic web), Athens (ontologies and brain-like computation) and Konstanz (visualisation and visual analytics).

Loads of intellectual stimulation, but now really looking forward to some time at home to consolidate a little.

During my time away I managed to fall downstairs, bleed profusely over the hotel floor, and break a tooth. My belonging didn’t fare any better: my glasses fell apart and my sandals and suitcase are now holding together by threads … So maybe safer at home for a bit!

Web Art/Science Camp — how web killed the hypertext star and other stories

Had a great day on Saturday at the at the Web Art/Science Camp (twitter: #webartsci , lanyrd: web-art-science-camp). It was the first event that I went to primarily with my Talis hat on and first Web Science event, so very pleased that Clare Hooper told me about it during the DESIRE Summer School.

The event started on Friday night with a lovely meal in the restaurant at the British Museum. The museum was partially closed in the evening, but in the open galleries Rosetta Stone, Elgin Marbles and a couple of enormous totem poles all very impressive. … and I notice the BM’s website when it describes the Parthenon Sculptures does not waste the opportunity to tell us why they should not be returned to Greece!

Treasury of Atreus

I was fascinated too by images of the “Treasury of Atreus” (which is actually a Greek tomb and also known as the Tomb of Agamemnon. The tomb has a corbelled arch (triangular stepped stones, as visible in the photo) in order to relieve load on the lintel. However, whilst the corbelled arch was an important technological innovation, the aesthetics of the time meant they covered up the triangular opening with thin slabs of fascia stone and made it look as though lintel was actually supporting the wall above — rather like modern concrete buildings with decorative classical columns.

how web killed the hypertext star

On Saturday, the camp proper started with Paul de Bra from TU/e giving a sort of retrospective on pre-web hypertext research and whether there is any need for hypertext research anymore. The talk brought out several of the issues that have worried me also for some time; so many of the lessons of the early hypertext lost in the web1.

For me one of the most significant issues is external linkage. HTML embeds links in the document using <a> anchor tags, so that only the links that the author has thought of can be present (and only one link per anchor). In contrast, mature pre-web hypertext systems, such as Microcosm2, specified links eternally to the document, so that third parties could add annotation and links. I had a few great chats about this with one of the Southampton Web Science DTC students; in particular, about whether Google or Wikipedia effectively provide all the external links one needs.

Paul’s brief history of hypertext started, predictably, with Vannevar Bush‘s  “As We May Think” and Memex; however he pointed out that Bush’s vision was based on associative connections (like the human mind) and trails (a form of narrative), not pairwise hypertext links. The latter reminded me of Nick Hammond’s bus tour metaphor for guided educational hypertext in the 1980s — occasionally since I have seen things a little like this, and indeed narrative was an issue that arose in different guises throughout the day.

While Bush’s trails are at least related to the links of later hypertext and the web, the idea of associative connections seem to have been virtually forgotten.  More recently in the web however, IR (information retrieval) based approaches for page suggestions like Alexa and content-based social networking have elements of associative linking as does the use of spreading activation in web contexts3

It was of course Nelson who coined the term hypertext, but Paul reminded us that Ted Nelson’s vision of hypertext in Xanadu is far richer than the current web.  As well as external linkage (and indeed more complex forms in his ZigZag structures, a form of faceted navigation.), Xanadu’s linking was often in the form of transclusions pieces of one document appearing, quoted, in another. Nelson was particularly keen on having only one copy of anything, hence the transclusion is not so much a copy as a reference to a portion. The idea of having exactly one copy seems a bit of computing obsession, and in non-technical writing it is common to have quotations that are in some way edited (elision, emphasis), but the core thing to me seems to be the fact that the target of a link as well as the source need not be the whole document, but some fragment.

Paul de Bra's keynote at Web Art/Science Camp (photo Clare Hooper)

Over a period 30 years hypertext developed and started to mature … until in the early 1990s came the web and so much of hypertext died with its birth … I guess a bit like the way Java all but stiltified programming languages. Paul had a lovely list of bad things about the web compared with (1990s) state of the art hypertext:

Key properties/limitations in the basic Web:

  1. uni-directional links between single nodes
  2. links are not objects (have no properties of their own)
  3. links are hardwired to their source anchor
  4. only pre-authored link destinations are possible
  5. monolithic browser
  6. static content, limited dynamic content through CGI
  7. links can break
  8. no transclusion of text, only of images

Note that 1, 3 and 4 are all connected with the way that HTML embeds links in pages rather than adopting some form of external linkage. However, 2 is also interesting; the fact that links are not ‘first class objects’. This has been preserved in the semantic web where an RDF triple is not itself easily referenced (except by complex ‘reification’) and so it is hard to add information about relationships such as provenance.

Of course, this same simplicity (or even that it was simplistic) that reduced the expressivity of HTML compared with earlier hypertext is also the reasons for its success compared with earlier more heavy weight and usually centralised solutions.

However, Paul went on to describe how many of the features that were lost have re-emerged in plugins, server enhancements (this made me think of systems such as zLinks, which start to add an element of external linkage). I wasn’t totally convinced as these features are still largely in research prototypes and not entered the mainstream, but it made a good end to the story!

demos and documentation

There was a demo session as well as some short demos as part of talks. Lots’s of interesting ideas. One that particularly caught my eye (although not incredibly webby) was Ana Nelson‘s documentation generator “dexy” (not to be confused with doxygen, another documentation generator). Dexy allows you to include code and output, including screen shots, in documentation (LaTeX, HTML, even Word if you work a little) and live updates the documentation as the code updates (at least updates the code and output, you need to change the words!). It seems to be both a test harness and multi-version documentation compiler all in one!

I recall that many years ago, while he was still at York, Harold Thimbleby was doing something a little similar when he was working on his C version of Knuth’s WEB literate programming system. Ana’s system is language neutral and takes advantage of recent developments, in particular the use of VMs to be able to test install scripts and to be sure to run code in a consistent environments. Also it can use browser automation for web docs — very cool 🙂

Relating back to Paul’s keynote this is exactly an example of Nelson’s transclusion — the code and outputs included in the document but still tied to their original source.

And on this same theme I demoed Snip!t as an example of both:

  1. attempting to bookmark parts of web pages, a form of transclusion
  2. using data detectors a form of external linkage

Another talk/demo also showed how Compendium could be used to annotate video (in the talk regarding fashion design) and build rationale around … yet another example of external linkage in action.

… and when looking after the event at some of Weigang Wang‘s work on collaborative hypermedia it was pleasing to see that it uses a theoretical framework for shared understanding in collaboratuve hypermedia that builds upon my own CSCW framework from the early 1990s 🙂

sessions: narrative, creativity and the absurd

Impossible to capture in a few words, but one session included different talks and discussion about the relation of narrative and various forms of web experiences — including a talk on the cognitive psychology of the Kafkaesque. Also discussion of creativity with Nathan live recording in IBIS!

what is web science

I guess inevitably in a new area there was some discussion about “what is web science” and even “is web science a discipline”. I recall similar discussions about the nature of HCI 25 years ago and not entirely resolved today … and, as an artist who was there reminded us, they still struggle with “what is art?”!

Whether or not there is a well defined discipline of ‘web science’, the web definitely throws up new issues for many disciplines including new challenges for computing in terms of scale, and new opportunities for the social sciences in terms of intrinsically documented social interactions. One of the themes that recurred to distinguish web science from simply web technology is the human element — joy to my ears of course as a HCI man, but I think maybe not the whole story.

Certainly the gathering of people from different backgrounds in a sort of disciplinary bohemia is exciting whether or not it has a definition.

  1. see also “Names, URIs and why the web discards 50 years of computing experience“[back]
  2. Wendy Hall, Hugh Davis and Gerard Hutchings, “Rethinking Hypermedia:: The Microcosm Approach, Springer, 1996.[back]
  3. Spreading activation is used by a number of people, some of my own work with others at Athens, Rome and Talis is reported in “Ontologies and the Brain: Using Spreading Activation through Ontologies to Support Personal Interaction” and “Spreading Activation Over Ontology-Based Resources: From Personal Context To Web Scale Reasoning“.[back]

endings and beginnings: cycling, HR and Talis

It is the end of the summer, the September rush starts (actually at the end of August) and on Friday I’ll be setting off on the ferry and be away from home for all of September and October 🙁  Of course I didn’t manage to accomplish as much as I wanted over the summer, and didn’t get away on holiday … except of course living next to the sea is sort of like holiday every day!  However, I did take some time off when Miriam visited, joining her on cycle rides to start her training for her Kenyan challenge — neither of us had been on a bike for 10 years!  Also this last weekend saw the world come to Tiree when a group of asylum seekers and refugees from the St Augustine Centre in Halifax visited the Baptist Church here — kite making, songs from Zimbabwe and loads of smiling faces.

In September I also hand over departmental personnel duty (good luck Keith :-)).  I’d taken on the HR role before my switch to part-time at the University, and so most of it stayed with me through the year 🙁 (Note, if you ever switch to part-time, better to do so before duties are arranged!). Not sorry to see it go, the people bit is fine, but so much paper filling!

… and beginnings … in September (next week!) I also start to work part-time with Talis.  Talis is a remarkable story.  A library information systems company that re-invented itself as a Semantic Web company and now, amongst other things, powering the Linked Data at data.gov.uk.

I’ve known Talis as a company from its pre-SemWeb days when aQtive did some development for them as part of our bid to survive the post-dot.com crash.   aQtive did in the end die, but Talis had stronger foundations and has thrived1.  In the years afterwards two ex-aQtive folk, Justin and Nad, went to Talis and for the past couple of years I have also been on the external advisory group for their SemWeb Platform.  So I will be joining old friends as well as being part of an exciting enterprise.

  1. Libraries literally need very strong foundations.  I heard of one university library that had to be left half empty because the architect had forgotten to take account of the weight of books.  As the shelves filled the whole building began to sink into the ground.[back]

Italian conferences: PPD10, AVI2010 and Search Computing

I got back from trip to Rome and Milan last Tuesday, this included the PPD10 workshop that Aaron, Lucia, Sri and I had organised, and the AVI 2008 conference, both in University of Rome “La Sapienza”, and a day workshop on Search Computing at Milan Polytechnic.

PPD10

The PPD10 workshop on Coupled Display Visual Interfaces1 followed on from a previous event, PPD08 at AVI 2008 and also a workshop on “Designing And Evaluating Mobile Phone-Based Interaction With Public Displays” at CHI2008.  The linking of public and private displays is something I’ve been interested in for some years and it was exciting to see some of the kinds of scenarios discussed at Lancaster as potential futures some years ago now being implemented over a range of technologies.  Many of the key issues and problems proposed then are still to be resolved and new ones arising, but certainly it seems the technology is ‘coming of age’.  As well as much work filling in the space of interactions, there were also papers that pushed some of the existing dimensions/classifications, in particular, Rasmus Gude’s paper on “Digital Hospitality” stretched the public/private dimension by considering the appropriation of technology in the home by house guests.  The full proceedings are available at the PPD10 website.

AVI 2010

AVI is always a joy, and AVI 2010 no exception; a biennial, single-track conference with high-quality papers (20% accept rate this year), and always in lovely places in Italy with good food and good company!  I first went to AVI in 1996 when it was in Gubbio to give a keynote “Closing the Loop: modelling action, perception and information“, and have gone every time since — I always say that Stefano Levialdi is a bit like a drug pusher, the first experience for free and ever after you are hooked! The high spot this year was undoubtedly Hitomi Tsujita‘s “Complete fashion coordinator2, a system for using social networking to help choose clothes to wear — partly just fun with a wonderful video, but also a very thoughtful mix of physical and digital technology.


images from Complete Fashion Coordinator

The keynotes were all great, Daniel Keim gave a really lucid state of the art in Visual Analytics (more later) and Patrick Lynch a fresh view of visual understanding based on many years experience and highlighting particularly on some of the more immediate ‘gut’ reactions we have to interfaces.  Daniel Wigdor gave an almost blow-by-blow account of work at Microsoft on developing interaction methods for next-generation touch-based user interfaces.  His paper is a great methodological exemplar for researchers combining very practical considerations, more principled design space analysis and targeted experimentation.

Looking more at the detail of Daniel’s work at Microsoft, it is interesting that he has a harder job than Apple’s interaction developers.  While Apple can design the hardware and interaction together, MS as system providers need to deal with very diverse hardware, leading to a ‘least common denominator’ approach at the level of quite basic touch interactions.  For walk-up-and use systems such as Microsoft Surface in bar tables, this means that users have a consistent experience across devices.  However, I did wonder whether this approach which is basically the presentation/lexical level of Seeheim was best, or whether it would be better to settle at some higher-level primitives more at the Seeheim dialog level, thinking particularly of the way the iPhone turns pull down menus form web pages into spinning selectors.  For devices that people own it maybe that these more device specific variants of common logical interactions allow a richer user experience.

The complete AVI 2010 proceedings (in colour or B&W) can be found at the conference website.

The very last session of AVI was a panel I chaired on “Visual Analytics: people at the heart of data” with Daniel Keim, Margit Pohl, Bob Spence and Enrico Bertini (in the order they sat at the table!).  The panel was prompted largely because the EU VisMaster Coordinated Action is producing a roadmap document looking at future challenges for visual analytics research in Europe and elsewhere.  I had been worried that it could be a bit dead at 5pm on the last day of the conference, but it was a lively discussion … and Bob served well as the enthusiastic but also slightly sceptical outsider to VisMaster!

As I write this, there is still time (just, literally weeks!) for final input into the VisMaster roadmap and if you would like a draft I’ll be happy to send you a PDF and even happier if you give some feedback 🙂

Search Computing

I was invited to go to this one-day workshop and had the joy to travel up on the train from Rome with Stu Card and his daughter Gwyneth.

The search computing workshop was organised by the SeCo project. This is a large single-site project (around 25 people for 5 years) funded as one of the EU’s ‘IDEAS Advanced Grants’ supporting ‘investigation-driven frontier research’.  Really good to see the EU funding work at the bleeding edge as so many national and European projects end up being ‘safe’.

The term search computing was entirely new to me, although instantly brought several concepts to mind.  In fact the principle focus of SeCo is the bringing together of information in deep web resources including combining result rankings; in database terms a form of distributed join over heterogeneous data sources.

The work had many personal connections including work on concept classification using ODP data dating back to aQtive days as well as onCue itself and Snip!t.  It also has similarities with linked data in the semantic web word, however with crucial differences.  SeCo’s service approach uses meta-descriptions of the services to add semantics, whereas linked data in principle includes a degree of semantics in the RDF data.  Also the ‘join’ on services is on values and so uses a degree of run-time identity matching (Stu Card’s example was how to know that LA=’Los Angeles’), whereas linked data relies on URIs so (again in principle) matching has already been done during data preparation.  My feeling is that the linking of the two paradigms would be very powerful, and even for certain kinds of raw data, such as tables, external semantics seems sensible.

One of the real opportunities for both is to harness user interaction with data as an extra source of semantics.  For example, for the identity matching issue, if a user is linking two data sources and notices that ‘LA’ and ‘Los Angeles’ are not identified, this can be added as part of the interaction to serve the user’s own purposes at that time, but by so doing adding a special case that can be used for the benefit of future users.

While SeCo is predominantly focused on the search federation, the broader issue of using search as part of algorithmics is also fascinating.  Traditional algorithmics assumes that knowledge is basically in code or rules and is applied to data.  In contrast we are seeing the rise of web algorithmics where knowledge is garnered from vast volumes of data.  For example, Gianluca Demartini at the workshop mentioned that his group had used the Google suggest API to extend keywords and I’ve seen the same trick used previously3.  To some extent this is like classic techniques of information retrieval, but whereas IR is principally focused on a closed document set, here the document set is being used to establish knowledge that can be used elsewhere.  In work I’ve been involved with, both the concept classification and folksonomy mining with Alessio apply this same broad principle.

The slides from the workshop are appearing (but not all there yet!) at the workshop web page on the SeCo site.

  1. yes I know this doesn’t give ‘PPD’ this stands for “public and private displays”[back]
  2. Hitomi Tsujita, Koji Tsukada, Keisuke Kambara, Itiro Siio, Complete Fashion Coordinator: A support system for capturing and selecting daily clothes with social network, Proceedings of the Working Conference on Advanced Visual Interfaces (AVI2010), pp.127–132.[back]
  3. The Yahoo! Related Suggestions API offers a similar service.[back]

data types and interpretation in RDF

After following a link from one of Nad’s tweets, read Jeni Tennison’s “SPARQL & Visualisation Frustrations: RDF Datatyping“.  Jeni had been having problems processing RDF of MP’s expense claims, because the amounts were plain RDF strings rather than as typed numbers.  She  suggests some best practice rules for data types in RDF based on the underlying philosophy of RDF that it should be self-describing:

  • if the literal is XML, it should be an XML literal
  • if the literal is in a particular language (such as a description or a name), it should be a plain literal with that language
  • otherwise it should be given an appropriate datatype

These seem pretty sensible for simple data types.

In work on the TIM project with colleagues in Athens and Rome, we too had issues with representing data types in ontologies, but more to do with the status of a data type.  Is a date a single thing “2009-08-03T10:23+01:00″, or is it a compound [[date year=”2009″ month=”8” …]]?

I just took a quick peek at how Dublin Core handles dates and see that the closest to standard references1 still include dates as ‘bare’ strings with implied semantics only, although one of the most recent docs does say:

It is recommended that RDF applications use explicit rdf:type triples …”

and David MComb’s “An OWL version of the Dublin Core” gives an alternative OWL ontology for DC that does include an explicit type for dc:date:

<owl:DatatypeProperty rdf:about="#date">
  <rdfs:domain rdf:resource="#Document"/>
  <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#dateTime"/>
</owl:DatatypeProperty>

Our solution to the compound types has been to have “value classes” which do not represent ‘things’ in the world, similar to the way the RDF for vcard represents  complex elements such as names using blank nodes:

<vCard:N rdf:parseType="Resource">
  <vCard:Family> Crystal </vCard:Family>
  <vCard:Given> Corky </vCard:Given>
  ...
</vCard:N>

From2

This is fine, and we can have rules for parsing and formatting dates as compound objects to and from, say, W3C datetime strings.  However, this conflicts with the desire to have self-describing RDF as these formatting and parsing rules have to be available to any application or be present as reasoning rules in RDF stores.  If Jeni had been trying to use RDF data coded like this she would be cursing us!

This tension between representations of things (dates, names) and more semantic descriptions is also evident in other areas.  Looking again at Dublin Core the metamodal allows a property such as “subject”  to have a complex object with a URI and possibly several string values.

Very semantic, but hardly mashes well with sources that just say <dc:subject>Biology</dc:subject>.  Again a reasoning store could infer one from the other, but we still have issues about where the knowledge for such transformations resides.

Part of the problem is that the ‘self-describing’ nature of RDF is a bit illusary.   In (Piercian) semiotics the interpretant of a sign is crucial, representations are interpreted by an agent in a particular context assuming a particular language, etc.  We do not expect human language to be ‘sef describing’ in the sense of being totally acontextual.  Similarly in philosophy words and ideas are treated as intentional, in the (not standard English) sense that they refer out to something else; however, the binding of the idea to the thing it refers to is not part of the word, but separate from it.  Effectively the desire to be self-describing runs the risk of ignoring this distinction3.

Leigh Dodds commented on Jeni’s post to explain that the reason the expense amounts were not numbers was that some were published in non-standard ways such as “12345 (2004)”.  As an example this captures succinctly the perpetual problem between representation and abstracted meaning.  If a journal article was printed in the “Autumn 2007” issue of  quarterly magazine, do we express this as <dc:date>2007</dc:date> or <dc:date>2007-10-01</dc:date>  attempting to give an approximation or inference from the actual represented date.

This makes one wonder whether what is really needed here is a meta-description of the RDF source (not simply the OWL as one wants to talk about the use of dc:date or whatever in a particular context) that can say things like “mainly numbers, but also occasionally non-strandard forms”, or “amounts sometimes refer to different years”.  Of course to be machine mashable there would need to be an ontology for such annotation …

  1. see “Expressing Simple Dublin Core in RDF/XML“, “Expressing Dublin Core metadata using HTML/XHTML meta and link elements” and Stanford DC OWL[back]
  2. Renato Iannella, Representing vCard Objects in RDF/XML, W3C Note, 22 February 2001.[back]
  3. Doing a quick web seek, these issues are discussed in several places, for example: Glaser, H., Lewy, T., Millard, I. and Dowling, B. (2007) On Coreference and the Semantic Web, (Technical Report, Electronics & Computer Science, University of Southampton) and Legg, C. (2007). Peirce, meaning and the semantic web (Paper presented at Applying Peirce Conference, University of Helsinki, Finland, June 2007). [back]

the more things change …

I’ve been reading Jeni (Tennison)’s Musings about techie web stuff XML, RDF, etc.  Two articles particularly caught my eye.  One was Versioning URIs about URIs for real world and conceptual objects (schools, towns), and in particular how to deal with the fact that these change over time.  The other was Working With Fragmented Overlapping Markup all about managing multiple hierarchies of structure for the same underlying data.

In the past I’ve studied issues both of versioning and of multiple structures on the same data1, and Jeni lays out the issues for both really clearly. However, both topics gave a sense of deja vu, not just because of my own work, but because they reminded me of similar issues that go way back before the web was even thought of.

Versioning URIs and unique identifiers2

In my very first computing job (COBOL programming for Cumbria County Council) many many years ago, I read an article in Computer Weekly about choice of keys (I think for ISAM not even relational DBs). The article argued that keys should NEVER contain anything informational as it is bound to change. The author gave an example of standard maritime identifiers for a ship’s journey (rather like a flight number) that were based on destination port and supposed to never change … except when the ship maybe moved to a different route. There is always an ‘except’, so, the author argued, keys should be non-informational.

Just a short while after reading this I was working on a personnel system for the Education Dept. and was told emphatically that every teacher had a DES code given to them by government and that this code never changed. I believed them … they were my clients. However, sure enough, after several rounds of testing and demoing when they were happy with everything I tried a first mass import from the council’s main payroll file. Validations failed on a number of the DES numbers. It turned out that every teacher had a DES number except for new teachers where the Education Dept. then issued a sort of ‘pretend’ one … and of course the DES number never changed except when the real number came through. Of course, the uniqueness of the key was core to lots of the system … major rewrite :-/

The same issues occurred in many relational DBs where the spirit (rather like RDF triples) was that the record was defined by values, not by identity … but look at most SQL DBs today and everywhere you see unique but arbitrary identifying ids. DOIs, ISBNs, the BBC programme ids – we relearn the old lessons.

Unfortunately, once one leaves the engineered world of databases or SemWeb, neither arbitrary ids nor versioned ones entirely solve things as many real world entities tend to evolve rather than metamorphose, so for many purposes http://persons.org/2009/AlanDix is the same as http://persons.org/1969/AlanDix, but for others different: ‘nearly same as’ only has limited transitivity!

  1. e.g. Modelling Versions in Collaborative Work and Collaboration on different document processing platforms; quite a few years ago now![back]
  2. edited version of comments I left on Jeni’s post[back]

going SIOC (Semantically-Interlinked Online Communities)

I’ve just SIOC enabled this blog using the SIOC Exporter for WordPress by Uldis Bojars.  Quoting from the SIOC project web site:

The SIOC initiative (Semantically-Interlinked Online Communities) aims to enable the integration of online community information. SIOC provides a Semantic Web ontology for representing rich data from the Social Web in RDF.

This means you can explore the blog as an RDF Graph including this post.

<sioc:Post rdf:about="http://www.alandix.com/blog/?p=176">
    <sioc:link rdf:resource="http://www.alandix.com/blog/?p=176"/>
    <sioc:has_container rdf:resource="http://www.alandix.com/blog/index.php?sioc_type=site#weblog"/>
    <dc:title>going SIOC (Semantically-Interlinked Online Communities)</dc:title>
    <sioc:has_creator>
        <sioc:User rdf:about="http://www.alandix.com/blog/author/admin/" rdfs:label="alan">
            <rdfs:seeAlso rdf:resource="http://www.alandix.com/blog/index.php?sioc_type=user&amp;sioc_id=1"/>
        </sioc:User>
    </sioc:has_creator>
...