Issues of ‘sameness’ are the underpinnings of any common understanding; if I talk about America, bananas or Caruso, we need to know we are talking about the ‘same’ thing.
Codd’s relational calculus was unashamedly phenomenological – if two things have the same attributes they are the same. Of course in practice, we often have things which look the same and yet we know are different: two cans of beans, two employees called David Jones. So many practical SQL database designs use unique ids as the key field of a table effectively making sure that otherwise identical rows are distinct1.
The id gives a database record identity – it is a something independent of its attributes.
I usually call this quality ‘it-ness’ and struggled to find appropriate (probably German) philosophical term to refer to it. Before we can point at something and say ‘it is a chair’, it must be an ‘it’ something we can refer to. This it-ness must be there before we consider the proeprties of ‘ot’ (legs, seat, etc.). It-ness is related to the substance/accident distinction important in medieval scholastic debate on transubstantiation, but different as the bread needs to be an ‘it’ before we can say that its real nature (substance) is different from its apparent nature (accidents).
In contrast RDF takes identity, as embodied in a URI, as its starting point. The origins of RDF are in web meta-data – talking about web pages … that is RDF is about talking about something else, and that something else has some form of (unique) identity. Although the word ‘ontology’ seems to be misused almost beyond recognition in computer science, here we are talking about true ontology. RDF assumes as a starting point it is discussing things that are, that exist, that have being. Given this of course several distinct things may have similar attributes2.
Whilst RDMS have problems talking about identity, and we often have to add artifices (like the id), to establish identity, in RDF the opposite problem arises. Often we do not have unique names even for web entities, and even less when we have RDF descriptions of people, places … or books. Nad discusses some of the problems of cleaning up book data (MARC, RDF and FRMR), part of which is establishing unique names … and really books are ‘easy’ as librarians have soent a long time thinking about idetifying them already.
FOAF (friend of a friend) is now widely used to represent personal relationships. In this WordPress blog, when I add blogroll entries it prompts for FOAF information: is this a work colleague, family, friend (but not foe or competitor … FOAF is definitely about being friendly!).
FOAF has an RDF format, but examples, both in practice … and in the XMLNS RDF specification, are not full of “rdf:about” links as are typical RDF documents. This is because, while people clearly do have unique identity, there is thankfully no URI scheme that uniquely and universally defies us3.
In practice FOAF says things like “there is a person whose name is John Doe”, or “the blog VirtualChaos is by a person who is a friend and colleague of the author of this blog”.
In terms of identity this is a blank node “the person who …”. The computational representation of the person is a placeholder, or a variable waiting to be associated with other placeholders.
In terms of phenomenological attributes, the values either do not uniquely identify an individual (here may be many John Doe’s) and the individual may have several potential values for a given attribute (John Doe may not be the body’s only name,and a person may have several email addresses).
In order to match individuals in FOAF, we typically need to make assumption: while I may have several email addresses, they are all personal, so if two people have the same email address they are the same person. Of course such reasoning is defeasible: some families share an email address, but serves as a way of performing partial and approximate matching.
I think to the semantic web purist the goal would be to have the unique personal URI. However, to my mind the incomplete, often vague and personally defined FOAF is closer to the way the real world works even when ontologically there is a unique entity in the world that is the subject. FOAF challenges simplistic assumptions and representations of both a phenomenological and ontological nature.
- Furthermore if you do not specify a key, RDMS are likely to treat a relation as bag rather than a set of tuples! Try inserting the same record twice.[back]
- For those who know their quantum mechanics RDMS records are like Fermions and obey Pauli exclusion principle, whilst RDF entities are like Bosons and several entities can exist with identical attributes.[back]
- As it says in The Prisoner “I am not a number” … although maybe one day soon we will all be biometrically identified and have a global URI :-/[back]
I think the term you may be after is Gestalt – an entity that is more than the sum of its parts.
Indeed – what you describe here is challenging.
thanks Rob. However, even to say something is more than the sum of its parts you first have to say it is ‘something’ and it is that sort of ‘it-ness’ I’d like a word for. In triple terms I might have (#519, name, “Alan Dix”), (#519, height, “5:10”), (#519, eye_colour, “brown”) whereas gestalt is about the overall quality of entity #519, it-ness simply that #519 is an entity that can be the subject of sentences … even if there are no sentences about it! In OO circles ‘identity’ is the closest term I know.
Pingback: Alan’s blog » RDF sequences … could they be more semantic?