In praise of inconsistency - the long tail of small data

Alan Dix
Lancaster University and Talis
www.hcibook.com/alan/
www.alandix.com/blog

Distinguished Alumnus Seminar, University of York, UK, 26th October 2011.

download slides (PDF, 2up, 3.2Mb)


Full reference:
A. Dix (2010). In praise of inconsistency - the long tail of small data. Distinguished Alumnus Seminar, University of York, UK, 26th October 2011.
http://www.hcibook.com/alan/talks/
York-Alumnus-2011-inconsistency/
more ...
download slides (PDF, 2up, 3.2Mb)
York seminar page

FireFly lights on a Chritsmas tree
FireFly - one thousand computers on a Christmas tree


vfridge - a fridge door on the web

abstract

Anyone who has taken a database course knows about normalisation, with the goal of creating a single coherent, consistent and authoritative data source. Whilst most databases do not follow all of Codd's rules, still the traditional approach in information systems design has been to analyse extensively so that a single internally consistent data source can serve the organisation. This same approach is applied to even larger projects such as electronic patient records system which the Public Accounts Committee has recently recommended abandoning at a cost to date of £2.7bn. An alternative approach to such systems is instead to deliberarly encourage multiple, but interlinked, data resources, from spreadsheets to full SQL databases with a policy to "do not enforce consistency, but highlight consistency" - which is just how real people work. With the Semantic Web similar forces are often at work at web scale, with the desire to create single coherent vocabularies, even if data consistency is less strongly enforced. Of course the practice is often less ordered. Is this a weakness or a potential strength? For large data sources , such as the OS open data, it is possible to create a coherent view, but what about the many smaller spreadsheets, web tables and micro-APIs that are proliferating. Can we harness this long tail of emerging data? If we are to do so, then I suggest that the way forward is to seek human-oriented semantics that accepts a level of inconsistency, but gives people the ability to track, describe and, where necessary, resolve it in context.

In praise of inconsistency - the long tail of small data from Alan Dix

 

 


Alan Dix 28/10/20101