In praise of inconsistency - the long tail of small data |
Lancaster University and Talis www.hcibook.com/alan/ www.alandix.com/blog |
Distinguished Alumnus Seminar, University of York, UK, 26th October 2011.
download slides (PDF, 2up, 3.2Mb)
| |||
abstractAnyone who has taken a database course knows about normalisation, with the goal of creating a single coherent, consistent and authoritative data source. Whilst most databases do not follow all of Codd's rules, still the traditional approach in information systems design has been to analyse extensively so that a single internally consistent data source can serve the organisation. This same approach is applied to even larger projects such as electronic patient records system which the Public Accounts Committee has recently recommended abandoning at a cost to date of £2.7bn. An alternative approach to such systems is instead to deliberarly encourage multiple, but interlinked, data resources, from spreadsheets to full SQL databases with a policy to "do not enforce consistency, but highlight consistency" - which is just how real people work. With the Semantic Web similar forces are often at work at web scale, with the desire to create single coherent vocabularies, even if data consistency is less strongly enforced. Of course the practice is often less ordered. Is this a weakness or a potential strength? For large data sources , such as the OS open data, it is possible to create a coherent view, but what about the many smaller spreadsheets, web tables and micro-APIs that are proliferating. Can we harness this long tail of emerging data? If we are to do so, then I suggest that the way forward is to seek human-oriented semantics that accepts a level of inconsistency, but gives people the ability to track, describe and, where necessary, resolve it in context.
|
Alan Dix 28/10/20101