digging ourselves back from the Semantic Web mire

Discussions on the Talis Platform Advisory Group prompted me to look at some of the APIs of new Semantic-Web-like services such as Freebase1.

Freebase is interesting as its underlying representation is graph/relationship based like RDF, but its Metaweb Query Language (MQL) uses JSON which is a more programming-like whole and parts representation with arrays and slots. Facebook’s new Data Store API also has objects and associations, but does not use RDF or other obvious web technologies.

So the question is – if the closest things to Semantic Web apps on the internet don’t use SemWeb techology like RDF, SPARQL etc. … are these SemWeb techologies fit for purpose or indeed useful at all?

I think the answer is that (i) partly they are not fit for purpose – caught in a backwater by their history, but (ii) that is like all things and they are what we have got, and (iii) we can use some of the tools of computing to make them work …

Well, we can start off with XML. XML (from its roots in SGML) … and the ML gives it away … is a mark-up language for showing structure in text. However, we all (and I do it!) use it as a data representation notation. Its roots give it away – confusion of type and part naming,2 very heavy-weight notation for dealing with small data items such as numbers, difficulties with binary data or even full character code sets for text3. There were so many IDLs that were so much better …. if only … and interestingly JSON (as used in MQL and many REST services) is in the tradition of programming language formats and, not surprisingly, more succinct for data representation.

On to RDF … again initially for meta data and annotation of other, more complex, data such as XML objects. However, (for good reasons) increasingly we are using it as the data itself. So we end up using something designed for meta data for data ??? … and furthermore the XML is now being used as a second-order notation!

The underlying semantic model of RDF is triples … and yes in the theoretical CS world this was recognised many years ago as a model that could, in principle, be used to represent anything. Now strangely enough this had virtually no effect on people writing real code. This is a bit like the fact that in mathematics Peano axioms of arithmetic represent any number as combinations of 0 and ‘+1’, but … surprisingly …. in supermarkets I have never been asked for 0+1+1+1+1+1+1+1+1+1 pounds and 0+1+1+1+1+1 pence.4

Triples are good for semantics, but programming languages use a variety of other constructs and again interesting to see Metaweb API using JSON as a ‘view’ on the underlying graph it supports.

Finally poor old SPARQL inherits from SQL which, like COBOL before it, was designed as an attempt to be an end-user query language … but in so doing was never well-fitted for programmatic manipulation. … And it is also interesting that the semantics of SPARQL seems to need tuples not just triples … :-/

However, to avoid leaving everyone gloomy at the weekend … of course we always live with the past, and there are good lessons to learn for the future …

1) the Sematic Web is a bit like Pascal programmers who have just discovered that everything could  be done in binary and have leapt upon machine code for its conceptual simplicity. But having seen the purity of the life of the aesthete, we probably need to come back to something more like Pascal

2) the graph/relation based model is very powerful, but other structures such as hierarchies, sequences, mappings will in different situations be either (a) more natural or (b) more efficient

3) we therefore need abstractions/layers/APIs/protocols that make triple stores more like other things … such as programing language structures in JSON … or maybe make triple stores look like relational databases 🙂

4) probably also abstractions that make non-triple structures more like the Semantic Web. These might be legacy structures (e.g. wrappers round relational databases) or specially designed ones (like the way Smalltalk made it look like even the number 2 was an object, but with efficient internal representations)

  1. listen to Talis’ pod cast interview with Jamie Taylor Freebase’s ‘Minister of Information’ (sic).[back]
  2. why attempts to do CSS-style print definitions in SGML always came a cropper, and what SOAP at enormous complexity tried to solve[back]
  3. I recently had text that contained tabs … sorry not in the accepted characters for XML data![back]
  4. And similarly, despite its disciples, LISP is not the dominant language despite its data structure simplicity[back]

2 thoughts on “digging ourselves back from the Semantic Web mire

  1. As Alan’s post arose from a discussion on the Advisory Group mailing list, let me share the response that I sent to that list…

    Alan

    thanks for that. Were you having a bad day? 😉

    RDF and the other Semantic Web technologies certainly aren’t perfect, and perhaps we’re therefore seeing a number of attempts to develop more focussed solutions to the specific subset of problems faced by each of the activities that you mention.

    The problem comes, of course, when we wish to expose data from some of these to the wider Web, or when we wish to interoperate between them. At that point, don’t we end up reinventing or reverse engineering an awful lot of the stuff that’s in RDF, OWL, SPARQL etc that these more lightweight alternatives originally omitted?

    Either that, or you end up with competing and partisan solutions that have to fight it out… or wait until tool developers cope with the additional complexity of supporting multiple possibilities.

    Our own chosen path has been to remain pretty close to the W3C specifications, and although not without their issues we are finding them largely fit for (our) purposes. Ian can elucidate… 😉

    Son/daughter of RDF will end up better, though, by observing and learning from the successes and failures associated with each of these attempts.

    Paul


    Dr Paul Miller
    Technology Evangelist, Talis
    w: http://www.talis.com/platform skype: napm1971
    mobile/cell: +44 7769 740083

    http://www.linkedin.com/in/pau1mi11er

  2. Pingback: Alan’s blog » practical RDF

Comments are closed.