Freebase is interesting as its underlying representation is graph/relationship based like RDF, but its Metaweb Query Language (MQL) uses JSON which is a more programming-like whole and parts representation with arrays and slots. Facebook’s new Data Store API also has objects and associations, but does not use RDF or other obvious web technologies.
So the question is – if the closest things to Semantic Web apps on the internet don’t use SemWeb techology like RDF, SPARQL etc. … are these SemWeb techologies fit for purpose or indeed useful at all?
I think the answer is that (i) partly they are not fit for purpose – caught in a backwater by their history, but (ii) that is like all things and they are what we have got, and (iii) we can use some of the tools of computing to make them work …
Well, we can start off with XML. XML (from its roots in SGML) … and the ML gives it away … is a mark-up language for showing structure in text. However, we all (and I do it!) use it as a data representation notation. Its roots give it away – confusion of type and part naming,2 very heavy-weight notation for dealing with small data items such as numbers, difficulties with binary data or even full character code sets for text3. There were so many IDLs that were so much better …. if only … and interestingly JSON (as used in MQL and many REST services) is in the tradition of programming language formats and, not surprisingly, more succinct for data representation.
On to RDF … again initially for meta data and annotation of other, more complex, data such as XML objects. However, (for good reasons) increasingly we are using it as the data itself. So we end up using something designed for meta data for data ??? … and furthermore the XML is now being used as a second-order notation!
The underlying semantic model of RDF is triples … and yes in the theoretical CS world this was recognised many years ago as a model that could, in principle, be used to represent anything. Now strangely enough this had virtually no effect on people writing real code. This is a bit like the fact that in mathematics Peano axioms of arithmetic represent any number as combinations of 0 and ‘+1′, but … surprisingly …. in supermarkets I have never been asked for 0+1+1+1+1+1+1+1+1+1 pounds and 0+1+1+1+1+1 pence.4
Triples are good for semantics, but programming languages use a variety of other constructs and again interesting to see Metaweb API using JSON as a ‘view’ on the underlying graph it supports.
Finally poor old SPARQL inherits from SQL which, like COBOL before it, was designed as an attempt to be an end-user query language … but in so doing was never well-fitted for programmatic manipulation. … And it is also interesting that the semantics of SPARQL seems to need tuples not just triples … :-/
However, to avoid leaving everyone gloomy at the weekend … of course we always live with the past, and there are good lessons to learn for the future …
1) the Sematic Web is a bit like Pascal programmers who have just discovered that everything couldÂ be done in binary and have leapt upon machine code for its conceptual simplicity. But having seen the purity of the life of the aesthete, we probably need to come back to something more like Pascal
2) the graph/relation based model is very powerful, but other structures such as hierarchies, sequences, mappings will in different situations be either (a) more natural or (b) more efficient
3) we therefore need abstractions/layers/APIs/protocols that make triple stores more like other things … such as programing language structures in JSON … or maybe make triple stores look like relational databases
4) probably also abstractions that make non-triple structures more like the Semantic Web. These might be legacy structures (e.g. wrappers round relational databases) or specially designed ones (like the way Smalltalk made it look like even the number 2 was an object, but with efficient internal representations)
- listen to Talis’ pod cast interview with Jamie Taylor Freebase’s ‘Minister of Information’ (sic). [back]
- why attempts to do CSS-style print definitions in SGML always came a cropper, and what SOAP at enormous complexity tried to solve [back]
- I recently had text that contained tabs … sorry not in the accepted characters for XML data! [back]
- And similarly, despite its disciples, LISP is not the dominant language despite its data structure simplicity [back]