RDF sequences … could they be more semantic?

Although triples can in principle express anything (well anything computational), this does not mean they are particularly appropriate for everything1.

RDF sequences are one of the most basic structured types and I have always found the use of rdf:_1, rdf:_2 at best clunky.  In particular I don’t like the fact that the textual form embodies the meaning.

In the RDF schema, rdf:_1, rdf:_2, etc are all instances of the class rdfs:ContainerMembershipProperty and sub-properties of rdfs:member.  However, I was also looking to see if there was some (implicitly defined) property of each of them that said which index they represented.  For example:

<rdf:_3> <rdf:isSequenceNumber> “3”

This would mean that the fact that rdf:_3 corresponded to the third element in a sequence was expressed semantically by rdf:isSequenceNumber as well as lexically in the label “_3”.

Sadly I could find no mention of this or any alternative technique to give the rdf:_nnn properties explicit semantics 🙁

This is not just me being a purist,  having explicit semantics makes it possible to express queries such as gathering together contiguous pairs in a sequence:

<ex:a> ?r1 ?a.
<ex:a> ?r2 ?b.
?r1 <rdf:hasSequenceNumber> ?index.
?r2 <rdf:hasSequenceNumber> ?index + 1.

Without explicit semantics, this would need to be expressed using string concatenation to create the labels for the relations – yuck!

Have I missed something? Is there an alternative mechanism in the RDF world that is like this or better?

Mind you I don’t see what’s wrong with a[index] … but may be that is just too simple?

  1. see also previous posts on “It-ness and identity: FOAF, RDF and RDMS” and “digging ourselves back from the Semantic Web mire“[back]

web of data practioner’s days

I am at the Web of Data Practitioners Days (WOD-PD 2008) in Vienna.  Mixture of talks and guided hands-on sessions.  I presented first half of session on “Using the Web of Data” this morning with focus (surprise) on the end user. Learnt loads about some of the applications out there – in fact Richard Cyganiak .  Interesting talk from a guy at the BBC about the way they are using RDF to link the currently disconnected parts of their web and also archives.  Jana Herwig from Semantic Web Company has been live blogging the event.

Being here has made me think about the different elements of SemWeb technology and how they individually contribute to the ‘vision’ of Linked Data.  The aim is to be able to link different data sources together.  For this having some form of shared/public vocabulary or ‘data definitions’ is essential as is some relatively uniform way of accessing data.  However, the implementation using RDF or use of SPARQL etc. seems to be secondary and useful for some data, but not other forms of data where tabular data may be more appropriate.  Linking these different representations  together seems far more important than specific internal representations.  So wondering whether there is a route to linked data that allows a more flexible interaction with existing data and applications as well as ‘sucking’ in this data into the SemWeb.  Can the vocabularies generated for SemWeb be used as meta information for other forms of information and can  query/access protocols be designed that leverage this, but include broader range of data types.

local URIs … mashing up the desktop

I’ve worried for a while about desktop URLs.

Within the web it is easy to link things together. If I want to refer to my home page I just add a link like this. However, on the desktop things are not so simple and I end up copying chunks of mail messages into the notes field in iCal rather than simply being able to link to the mail message where I arranged the meeting.

Links from the desktop to the web are easy … just use the URL … many desktop applications including mail clients and word processors will allow you to embed clickable links. Indeed it is often easier to link to a web page than to another object on the desktop! However, things get more difficult if you want to link the other way round, from a web page to a local file or resource. In my browser’s favourites I have several links to local files, but you cannot easily do the same if your bookmarks are in a web service like del.icio.us or even my own Snip!t. It is hard to seamlessly weave your desktop into the global web.

A couple of events brought this issue to a head for me.

First at the CHI workshop on PIM entitled the Disappearing Desktop, I asked if anyone knew of work in the area and I heard from Leo Sauermann that they had made some progress on this as part of the Gnowsis project. Their proposal for a Desktop URI Scheme (edited by Leo) is targeted principally at the first of the scenarios above, being able to link between things within the desktop.

The second event was at the AVI workshop on designing multi-touch interaction techniques for coupled public and private displays. During discussions abut touch-based interactions such as the Microsoft Surface or Apple iPhone, we considered scenarios where peole got together for a meeting (as we were) in a hotel bar (where we split for small group discussion) and had screens on table tops and walls, laptops, tablets, phones … and wanted to seamlessly move material between devices. Clearly an essential requirement for which is some way to identify resources across ad hoc collections of devices.

Finally I was in Athens working with George Lepouras, Akrivi Katifori and others. George had developed a Thunderbird extension to allow Snip!t to snip from mail messages … but while we could snip the text there was no way for the Snip!t page to link back to the mail message. We need full round trip URIs that link desktop and web with no distinction – URIs that can be embedded in a web page and (assuming you have the right permissions and are in an appropriate place) can be clicked and the appropriate mail message, calendar entry or whatever is opened.

Based on this and discussions we had, I drafted a discussion document on globally accessible local URIs. Any feedback very welcome.

Over the summer we hope to put together a demonstrator / reference implementation – if anyone is interested let me know.

It-ness and identity: FOAF, RDF and RDMS

Issues of ‘sameness’ are the underpinnings of any common understanding; if I talk about America, bananas or Caruso, we need to know we are talking about the ‘same’ thing.

Codd’s relational calculus was unashamedly phenomenological – if two things have the same attributes they are the same. Of course in practice, we often have things which look the same and yet we know are different: two cans of beans, two employees called David Jones. So many practical SQL database designs use unique ids as the key field of a table effectively making sure that otherwise identical rows are distinct1.

The id gives a database record identity – it is a something independent of its attributes.

I usually call this quality ‘it-ness’ and struggled to find appropriate (probably German) philosophical term to refer to it. Before we can point at something and say ‘it is a chair’, it must be an ‘it’ something we can refer to. This it-ness must be there before we consider the proeprties of ‘ot’ (legs, seat, etc.). It-ness is related to the substance/accident distinction important in medieval scholastic debate on transubstantiation, but different as the bread needs to be an ‘it’ before we can say that its real nature (substance) is different from its apparent nature (accidents).

In contrast RDF takes identity, as embodied in a URI, as its starting point. The origins of RDF are in web meta-data – talking about web pages … that is RDF is about talking about something else, and that something else has some form of (unique) identity. Although the word ‘ontology’ seems to be misused almost beyond recognition in computer science, here we are talking about true ontology. RDF assumes as a starting point it is discussing things that are, that exist, that have being. Given this of course several distinct things may have similar attributes2.

Whilst RDMS have problems talking about identity, and we often have to add artifices (like the id), to establish identity, in RDF the opposite problem arises. Often we do not have unique names even for web entities, and even less when we have RDF descriptions of people, places … or books. Nad discusses some of the problems of cleaning up book data (MARC, RDF and FRMR), part of which is establishing unique names … and really books are ‘easy’ as librarians have soent a long time thinking about idetifying them already.

FOAF (friend of a friend) is now widely used to represent personal relationships. In this WordPress blog, when I add blogroll entries it prompts for FOAF information: is this a work colleague, family, friend (but not foe or competitor … FOAF is definitely about being friendly!).

FOAF has an RDF format, but examples, both in practice … and in the XMLNS RDF specification, are not full of “rdf:about” links as are typical RDF documents. This is because, while people clearly do have unique identity, there is thankfully no URI scheme that uniquely and universally defies us3.

In practice FOAF says things like “there is a person whose name is John Doe”, or “the blog VirtualChaos is by a person who is a friend and colleague of the author of this blog”.

In terms of identity this is a blank node “the person who …”. The computational representation of the person is a placeholder, or a variable waiting to be associated with other placeholders.

In terms of phenomenological attributes, the values either do not uniquely identify an individual (here may be many John Doe’s) and the individual may have several potential values for a given attribute (John Doe may not be the body’s only name,and a person may have several email addresses).

In order to match individuals in FOAF, we typically need to make assumption: while I may have several email addresses, they are all personal, so if two people have the same email address they are the same person. Of course such reasoning is defeasible: some families share an email address, but serves as a way of performing partial and approximate matching.

I think to the semantic web purist the goal would be to have the unique personal URI. However, to my mind the incomplete, often vague and personally defined FOAF is closer to the way the real world works even when ontologically there is a unique entity in the world that is the subject. FOAF challenges simplistic assumptions and representations of both a phenomenological and ontological nature.

  1. Furthermore if you do not specify a key, RDMS are likely to treat a relation as bag rather than a set of tuples! Try inserting the same record twice.[back]
  2. For those who know their quantum mechanics RDMS records are like Fermions and obey Pauli exclusion principle, whilst RDF entities are like Bosons and several entities can exist with identical attributes.[back]
  3. As it says in The Prisoner “I am not a number” … although maybe one day soon we will all be biometrically identified and have a global URI :-/[back]

practical RDF

I just came across D2RQ, a notation (plus implementation) for mapping relational databases to RDF, developed over the last four years by Chris Bizer, Richard Cyganiak and others at Freie Universität Berlin. In a previous post, “digging ourselves back from the Semantic Web mire“, I worried about the ghetto-like nature of RDF and the need for “abstractions that make non-triple structures more like the Semantic Web”, D2RQ is exactly the sort of thing, allowing existing relational databases to be accessed (but not updated) as if they were and RDF triple stores, including full SPARQL queries.

As D2RQ has clearly been around for years, I tried to do a bit of a web search to find things the other way around – more programmer-friendly layers on top of RDF (or XML) allowing it to be manipulated with IDL-like or other abstractions closer to ‘normal’ programming. ECMAScript for XML (E4X) seems to be just this allowing reasonably easy access to XML (but I guess RDF would be ‘flat’ in this). E4X has been around a few years (standard since 2005), but as far as I can see not yet in IE (surprise!). I guess for really practical XML it would be JSON, and there’s a nice discussion of different RDF in JSON representation issues on the n2 wiki “RDF JSON Brainstorming“. However, both E4X and RDF in JSON still are just accessing RDF nicely not adding higher level structure.

Going back to the beginning I was wondering about any tools that represent RDF as SQL / RDMS in order to make it available to ‘old technology’ … but then remembered that SPARQL creates tuples not triples so, I guess, one could say that is exactly what it does :-/

Usabilty and Web2.0

Nad did a brilliant guest lecture for our undergraduate HCI class at Lancaster on Monday. His slides and blog about the lecture are at Virtual Chaos. He touched on issues of democracy vs. authority of information, dynamic content vs. accessibility and of course increasing issues of privacy on social networking sites. He also had awesome slides to using loads of Flickr photos under creative commons … community content in action not just words! Of course also touched on Web3.0 and future convergence between emergent community phenomena and structured Semantic Web technologies.

digging ourselves back from the Semantic Web mire

Discussions on the Talis Platform Advisory Group prompted me to look at some of the APIs of new Semantic-Web-like services such as Freebase1.

Freebase is interesting as its underlying representation is graph/relationship based like RDF, but its Metaweb Query Language (MQL) uses JSON which is a more programming-like whole and parts representation with arrays and slots. Facebook’s new Data Store API also has objects and associations, but does not use RDF or other obvious web technologies.

So the question is – if the closest things to Semantic Web apps on the internet don’t use SemWeb techology like RDF, SPARQL etc. … are these SemWeb techologies fit for purpose or indeed useful at all?

I think the answer is that (i) partly they are not fit for purpose – caught in a backwater by their history, but (ii) that is like all things and they are what we have got, and (iii) we can use some of the tools of computing to make them work …

Continue reading

  1. listen to Talis’ pod cast interview with Jamie Taylor Freebase’s ‘Minister of Information’ (sic).[back]