web ephemera and web privacy

Yesterday I was twittering about a web page I’d visited on the BBC1 and the tweet also became my Facebook status2.  Yanni commented on it, not because of the content of the link, but because he noticed the ‘is.gd’ url was very compact.  Thinking about this has some interesting implications for privacy/security and the kind of things you might to use different url shortening schemes for, but also led me to develop an interesting time-wasting application ‘LuckyDip‘ (well if ‘develop’ is the right word as it was just 20-30 mins hacking!).

I used the ‘is.gd’ shortening because it was one of three schemes offered by twirl, the twitter client I use.  I hadn’t actually noticed that it was significantly shorter than the others or indeed tinyurl, which is what I might have thought of using without twirl’s interface.

Here is the url of this blog <http://www.alandix.com/blog/> shortened by is.gd and three other services:

snurl:   http://snurl.com/5ot5k
twurl:  http://twurl.nl/ftgrwl
tinyurl:  http://tinyurl.com/5j98ao
is.gd:  http://is.gd/7OtF

The is.gd link is small for two reasons:

  1. ‘is.gd’ is about as short as you can get with a domain name!
  2. the ‘key’ bit after the domain is only four characters as opposed to 5 (snurl) or 6 (twurl, tinyurl)

The former is just clever domain choice, hard to get something short at all, let alone short and meaningful3.

The latter however is as a result of a design choice at is.gd.  The is.gd urls are allocated sequentially, the ‘key’ bit (7OtF) is simply an encoding of the sequence number that was allocated.  In contrast tinyurl seems to do some sort of hash either of the address or maybe of a sequence number.

The side effect of this is that if you simply type in a random key (below the last allocated sequence number) for an is.gd url it will be a valid url.  In contrast, the space of tinyurl is bigger, so ‘in principle’ only about one in a hundred keys will represent real pages … now I say ‘in principle’ because experimenting with tinyurl I find every six character seqeunce I type as a key gets me to a valid page … so maybe they do some sort of ‘closest’ match.

Whatever url shortening scheme you use by their nature the shorter url will be less redundant than a full url – more ‘random’ permutations will represent meaningful items.  This is a natural result of any ‘language’, the more concise you are the less redundant the language.

At a practical level this means that if you use a shortened url, it is more likely that someone  typing in a random is.gd (or tinyurl) key will come across your page than if they just type a random url.  Occasionally I upload large files I want to share to semi-private urls, ones that are publicly available, but not linked from anywhere.  Because they are not linked they cannot be found through search engines and because urls are long it would be highly unlikely that someone typing randomly (or mistyping) would find them.

If however, I use url shortening to tell someone about it, suddenly my semi-private url becomes a little less private!

Now of course this only matters if people are randomly typing in urls … and why would they do such a thing?

Well a random url on the web is not very interesting in general, there are 100s of millions and most turn out to be poor product or hotel listing sites.  However, people are only likely to share interesting urls … so random choices of shortened urls are actually a lot more interesting than random web pages.

So, just for Yanni, I spent a quick 1/2 hour4 and made a web page/app ‘LuckyDip‘.  This randomly chooses a new page from is.gd every 20 seconds – try it!


successive pages from LuckyDip

Some of the pages are in languages I can’t read, occasionally you get a broken link, and the ones that are readable, are … well … random … but oddly compelling.  They are not the permanently interesting pages you choose to bookmark for later, but the odd page you want to send to someone … often trivia, news items, even (given is.gd is in a twitter client) the odd tweet page on the twitter site.  These are not like the top 20 sites ever, but the ephemera of the web – things that someone at some point thought worth sharing, like overhearing the odd raised voice during a conversation in a train carriage.

Some of the pages shown are map pages, including ones with addresses on … it feels odd, voyeuristic, web curtain twitching – except you don’t know the person, the reason for the address; so maybe more like sitting watching people go by in a crowded town centre, a child cries, lovers kiss, someone’s newspaper blows away in the wind … random moments from unknown lives.

In fact most things we regard as private are not private from everyone.  It is easy to see privacy like an onion skin with the inner sanctum, then those further away, and then complete strangers – the further away someone is from ‘the secret’ the more private something is.  This is certainly the classic model in military security.  However, think further and there are many things you would be perfectly happy for a complete stranger to know, but maybe not those a little closer, your work colleagues, your commercial competitors.  The onion sort of reverses, apart from those that you explicitly want to know, in fact the further out of the onion, the safer it is.  Of course this can go wrong sometimes, as Peter Mandleson found out chatting to a stranger in a taverna (see BBC blog).

So I think LuckyDip is not too great a threat to the web’s privacy … but do watch out what you share with short urls … maybe the world needs a url lengthening service too …

And as a postscript … last night I was trying out the different shortening schemes available from twirl, and accidentally hit return, which created a tweet with the ‘test’ short url in it.  Happily you can delete tweets, and so I thought I had eradicated the blunder unless any twitter followers happened to be watching at that exact moment … but I forgot that my twitter feed also goes to my Facebook status and that deleting the tweet on twitter did not remove the status, so overnight the slip was my Facebook status and at least one person noticed.

On the web nothing stays secret long, and if anything is out there, it is there for ever … and will come back to hant you someday.

  1. This is the tweet “Just saw http://is.gd/7Irv Sad state of the world is that it took me several paragraphs before I realised it was a joke.”[back]
  2. I managed to link them up some time ago, but cannot find again the link on twitter that enabled this, so would be stuck if I wanted to stop it![back]
  3. anyone out there registering Bangaldeshi domains … if ‘is’ is available!![back]
  4. yea it should ave been less, but I had to look up how to access frames in javascript, etc.[back]

Coast to coast: St Andrews to Tiree

A week ago I was in St Andrews on the east coast of Scotland delivering three lectures on “Human Computer Interaction: as it was, as it is and as it may be” as part of their distinguished lecture series and now I am in Tiree in the wild western ocean off the west coast.

I had a great time in St Andrews and was well looked after by some I knew already Ian, Gordan, John and Russell, and also met many new people. Ate good food and stayed in a lovely hotel overlooking the sea (and golf course) and full of pictures of golfers (well what do you expect in St Andrews).

For the lectures, I was told the general pattern was one lecture about the general academic area, one ‘state of the art’ and one about my own stuff … hence the three parts of the title!  Ever for cutesy titles I then called the individual lectures “Whose Computer Is It Anyway”, “The Great Escape” and “Connected, but Under Control, Big, but Brainy?”.

The first lecture was about the fact that computers are always ultimately for people (surprise surprise!) and I used Ian’s slight car accident on the evening before the lecture as a running example (sorry Ian).

The second lecture was about the way computers have escaped the office desktop and found their way into the physical world of ubiquitous computing, the digital world of the web ad into our everyday lives in out homes and increasingly the hub of our social lives too.  Matt Oppenheim did some great cartoons for this and I’m going to use them again in a few weeks when I visit Dublin to do the inaugural lecture for SIGCHI Ireland.

for 20 years the computer is chained to the office desktop (image © Matt Oppenheim)

(© Matt Oppenheim)

... now escapes: out into the world, spreading across the net, in the home, in our social lives (image © Matt Oppenheim)

(© Matt Oppenheim)

The last lecture was about intelligent internet stuff, similar to the lecture I gave at Aveiro a couple of weeks back … mentioning again the fact that the web now has the same information storage and processing capacity as a human brain1 … always makes people think … well at least it always makes ME think about what it means to be human.

… and now … in Tiree … sun, wild wind, horizontal hail, and paddling in the (rather chilly) sea at dawn

  1. see the brain and the web[back]

web of data practioner’s days

I am at the Web of Data Practitioners Days (WOD-PD 2008) in Vienna.  Mixture of talks and guided hands-on sessions.  I presented first half of session on “Using the Web of Data” this morning with focus (surprise) on the end user. Learnt loads about some of the applications out there – in fact Richard Cyganiak .  Interesting talk from a guy at the BBC about the way they are using RDF to link the currently disconnected parts of their web and also archives.  Jana Herwig from Semantic Web Company has been live blogging the event.

Being here has made me think about the different elements of SemWeb technology and how they individually contribute to the ‘vision’ of Linked Data.  The aim is to be able to link different data sources together.  For this having some form of shared/public vocabulary or ‘data definitions’ is essential as is some relatively uniform way of accessing data.  However, the implementation using RDF or use of SPARQL etc. seems to be secondary and useful for some data, but not other forms of data where tabular data may be more appropriate.  Linking these different representations  together seems far more important than specific internal representations.  So wondering whether there is a route to linked data that allows a more flexible interaction with existing data and applications as well as ‘sucking’ in this data into the SemWeb.  Can the vocabularies generated for SemWeb be used as meta information for other forms of information and can  query/access protocols be designed that leverage this, but include broader range of data types.

Comics and happy problem solving

I am in Eindhoven doing CSCW, silly ideas and other things with the USI students here. On the book shelf here is Scott McCloud’s “Understanding Comics” I picked this up last year and couldn’t put it down until I had read it all. There is another book on the shelves this year “Reinventing Comics” and I daren’t pick it up until I’ve done all the work I want to today!

Understanding Comics is both an apologetic for comics as an art form and also an exploration into what makes a comic a comic and how comics manage to captivate and give a sense of narrative and action through what are basically static images. As well as being a good read about comics and about art there seem to be many lessons there for other forms of narrative and animation especially on the web.

As far as I can see (without starting to read it and not being able to stop), Reinventing Comics seems to be about the way online delivery trough the web is giving new opportunities for Comic art … but maybe when I finish everything today I will find out.

Less graphic and less fun, but no less fascinating, I have been dipping into chapters of “The Psychology of Problem Solving“, which was also sitting on the USI shelves. I was particularly enthralled by descriptions of experiments where subjects were asked to accomplish divergent thinking tasks whilst either pushing their palms upwards from under a table, or pushing down from on top. The former a positive, ‘come to me’ gesture elicited more diverse ideas than the latter, negative, ‘go away’ gesture, even though the only difference was the muscle groups in tension. I’ve seen other research that shows how our brains monitor our body state to ‘see how we feel’ (like smiling therapy), but this was one of the most subtle and conclusive.

During the week I have had the USI students work through a design brief starting with silly ideas then moving through  structured analysis to good ideas. Perhaps I should have had them pushing up on tables in the first part and down in the second?

PPIG2008 and the twenty first century coder

Last week I was giving a keynote at the annual workshop PPIG2008 of the Psychology of Programming Interest Group.   Before I went I was politely pronouncing this pee-pee-eye-gee … however, when I got there I found the accepted pronunciation was pee-pig … hence the logo!

My own keynote at PPIG2008 was “as we may code: the art (and craft) of computer programming in the 21st century” and was an exploration of the changes in coding from 1968 when Knuth published the first of his books on “the art of computer programming“.  On the web site for the talk I’ve made a relatively unstructured list of some of the distinctions I’ve noticed between 20th and 21st Century coding (C20 vs. C21); and in my slides I have started to add some more structure.  In general we have a move from more mathematical, analytic, problem solving approach, to something more akin to a search task, finding the right bits to fit together with a greater need for information management and social skills. Both this characterisation and the list are, of course, a gross simplification, but seem to capture some of the change of spirit.  These changes suggest different cognitive issues to be explored and maybe different personality types involved – as one of the attendees, David Greathead, pointed out, rather like the judging vs. perceiving personality distinction in Myers-Briggs1.

One interesting comment on this was from Marian Petre, who has studied many professional programmers.  Her impression, and echoed by others, was that the heavy-hitters were the more experienced programmers who had adapted to newer styles of programming, whereas  the younger programmers found it harder to adapt the other way when they hit difficult problems.  Another attendee suggested that perhaps I was focused more on application coding and that system coding and system programmers were still operating in the C20 mode.

The social nature of modern coding came out in several papers about agile methods and pair programming.  As well as being an important phenomena in its own right, pair programming gives a level of think-aloud  ‘for free’, so maybe this will also cast light on individual coding.

Margaret-Anne Storey gave a fascinating keynote about the use of comments and annotations in code and again this picks up the social nature of code as she was studying open-source coding where comments are often for other people in the community, maybe explaining actions, or suggesting improvements.  She reviewed a lot of material in the area and I was especially interested in one result that showed that novice programmers with small pieces of code found method comments more useful than class comments.  Given my own frequent complaint that code is inadequately documented at the class or higher level, this appeared to disagree with my own impressions.  However, in discussion it seemed that this was probably accounted for by differences in context: novice vs. expert programmers, small vs large code, internal comments vs. external documentation.  One of the big problems I find is that the way different classes work together to produce effects is particularly poorly documented.  Margaret-Anne described one system her group had worked on2 that allowed you to write a tour of your code opening windows, highlighting sections, etc.

I sadly missed some of the presentations as I had to go to other meetings (the danger of a conference at your home site!), but I did get to some and  was particularly fascinated by the more theoretical/philosophical session including one paper addressing the psychological origins of the notions of objects and another focused on (the dangers of) abstraction.

The latter, presented by Luke Church, critiqued  Jeanette Wing‘s 2006 CACM paper on Computational Thinking.  This is evidently a ‘big thing’ with loads of funding and hype … but one that I had entirely missed :-/ Basically the idea is to translate the ways that one thinks about computation to problems other than computers – nerds rule OK. The tenet’s of computational thinking seem to overlap a lot with management thinking and also reminded me of the way my own HCI community and also parts of the Design (with capital D) community in different ways are trying to say they we/they are the universal discipline  … well if we don’t say it about our own discipline who will …the physicists have been getting away with it for years 😉

Luke (and his co-authors) argument is that abstraction can be dangerous (although of course it is also powerful).  It would be interesting perhaps rather than Wing’s paper to look at this argument alongside  Jeff Kramer’s 2007 CACM article “Is abstraction the key to computing?“, which I recall liking because it says computer scientists ought to know more mathematics 🙂 🙂

I also sadly missed some of Adrian Mackenzie‘s closing keynote … although this time not due to competing meetings but because I had been up since 4:30am reading a PhD thesis and after lunch on a Friday had begin to flag!  However, this was no reflection an Adrian’s talk and the bits I heard were fascinating looking at the way bio-tech is using the language of software engineering.  This sparked a debate relating back to the overuse of abstraction, especially in the case of the genome where interactions between parts are strong and so the software component analogy weak.  It also reminded me of yet another relatively recent paper3 on the way computation can be seen in many phenomena and should not be construed solely as a science of computers.

As well as the academic content it was great to be with the PPIG crowd they are a small but very welcoming and accepting community – I don’t recall anything but constructive and friendly debate … and next year they have PPIG09 in Limerick – PPIG and Guiness what could be better!

  1. David has done some really interesting work on the relationship between personality types and different kinds of programming tasks.  I’ve seen him present before about debugging and unfortunately had to miss his talk at PPIG on comprehension.  Given his work has has shown clearly that there are strong correlations between certain personality attributes and coding, it would be good to see more qualitative work investigating the nature of the differences.   I’d like to know whether strategies change between personality types: for example, between systematic debugging and more insight-based scan and see it bug finding. [back]
  2. but I can’t find on their website :-([back]
  3. Perhaps 2006/2007 in either CACM or Computer Journal, if anyone knows the one I mean please remind me![back]

eprints: relaxed and scalable interfaces

A story, a bit of a moan … and then I hope some constructive ideas .

It is time for the University annual report, which includes a list of all publications across the University. In previous years this was an easy job. I keep an up-to-date web page with all my publications for each year, so I simply gave our secretaries a link to the web publication list, they cut and paste it into Word, tidied the format a little … and job done. However, this year things are different … a short while ago the department installed an EPrints server. This year the department is making its submission to the University by downloading from the EPrints server, which means we have to upload to it :-/

The citation adding page runs to several screen fulls including breaking author names down into surname forename … the thought of that was somewhat daunting.

Fortunately you can import into EPrints from BibTeX and EndNote bibliographies … unfortunately mine is in plain HTML 🙁

Now the 10 million AKT project that Southampton was a lead partner in developed a free text bibliography server … but, unfortunately, not included in EPrints 🙁

So a few regular expression substitutions and a lot of hand edits later and I convert my 2007 pub list into BibTeX (actually couple of hours in total including ‘bug fixing’ syntax errors in the BibTeX).

Then upload the clean .bib file … beautiful – I get a list of all the uploaded items … but they are my ‘user workspace’ and not properly deposited. This I have to do one-by-one and not allowed to do so until I have filled in various additional fields, scattered liberally over several forms including one form for adding subjects that requires several clicks to open up a lovely tree browser that in the end has only 2 leaves.

Now after grouching the lessons.

There seems to be a few key problems:

(1) First the standard usability issues: the inclusion pages are oriented around the data in the system not the user, there are no shortcuts for previously entered authors, etc.

(2) The system will not allow data to be entered if it is not complete. Of course the institution wants full data (e.g. whether it is refereed, etc.), but making it difficult to enter data makes it likely that user will not bother. That is the alternative to perfect data may be no data!

(3) The interface to enter and edit is fine for a small number of entries, but becomes a pain when processing a complete publication list. Contrarily, the page for setting the subject categories is designed for handling large trees of categories but does not gracefully handle a small number.

Both (2) and (3) are also common problems, but not so well considred in usability iterature.

A useful inofmration systems heuristic that I often advocate is

“don’t enforce consistency, but highlight inconsistency”

In this case why not allow me to deposit incomplete records and then leave me a ‘to do list’ page … yes and maybe even badger me periodically with automatic emails to check it.

Anther maxim that applies to (2) is:

“Make it easy for the user to do what you want”

If you want people to upload references make it as easy as possible to do so. Now I’m sure the designers intend this to be the case, but it is easy sometimes to focus on usability of individual screens and interactions rather than the wider context.

In fact, this was the second time that I was faced with problem (3) today. Fiona had accidentally double clicked a large number of archived files when she was trying to drag them to Trash. She had to kill the application as it blindly started to open dozens of files (why not ask?). However, it was clearly coded resiliently and kept backup copies of the files it had started to open, so, when she tried to re-open it, InDesign started to ask her whether she wanted to recover the files … but did so one-by-one and wouldn’t let her do anything else until she had laboriously answered every dialogue box.

In this case the solution is fairly obvious, if there are many (or even ore than one) files to be recovered why not list them and aks about them all, perhaps with check boxes so you can recover some but not others. In general tabular or list-style views tend to work better with large numbers of items, allowing you to perform edits to many items in a single transaction.

Similarly in EPrints, after the import there were just a few fields required for each entry, some form of tabular view would have allowed me to scan down the link and select ‘refereed/not refereed’ for each entry.

With the subject categories, it was in a sense the opposite problem, but a symptom of the way we, as designers, often have some idea in out heads about how large a particular set is likely to be and then design around that idea. However, if you can notice this tendency one can often produce variant interaction styles depending on the size of the set. For example, in web-based systems to browse hierarchies I have often (but not always!) added code that effectively says, “if the number of entries at this level is not to great, then show this level as headings with the next level as well.”


fully expanded EPrints subjects menu

The EPrints server clearly expects that the subject tree will be far bigger, as it would be on a University-wide installation. Although even if the list is very large the number of items used by an individual would be small.

So as general design advice, if there is some form of collection:

  • are there any absolute lower or upper bounds on the size?
  • check, within these absolute bounds, what the interface would be like with 1, 3, 10, 100, 1000 in the collection
  • if the potential collection is large, is the likely size needed for a particular usre, situation, smaller?

To be fair I am an unusual user with my pretty complete HTML publication lists, if I had no systematic way of keeping my own publications then I would appreciate EPrints more. However, there will be many with word processor lists, so maybe I’m not so unusual. I assume other people just knuckle down and get on with it. So the real problem is that I am impatient user!

Which brings us to the last and most valuable piece of advice. When it comes to ussr testing cussed users are worth their weight in gold. Users that are too nice are useless,; they cope, they manage and would hate to hurt your feelings by telling you your system is not perfect. So find the nasty users, the impatient users, the ones who complain at the slightest things … they are true treasure.

local URIs … mashing up the desktop

I’ve worried for a while about desktop URLs.

Within the web it is easy to link things together. If I want to refer to my home page I just add a link like this. However, on the desktop things are not so simple and I end up copying chunks of mail messages into the notes field in iCal rather than simply being able to link to the mail message where I arranged the meeting.

Links from the desktop to the web are easy … just use the URL … many desktop applications including mail clients and word processors will allow you to embed clickable links. Indeed it is often easier to link to a web page than to another object on the desktop! However, things get more difficult if you want to link the other way round, from a web page to a local file or resource. In my browser’s favourites I have several links to local files, but you cannot easily do the same if your bookmarks are in a web service like del.icio.us or even my own Snip!t. It is hard to seamlessly weave your desktop into the global web.

A couple of events brought this issue to a head for me.

First at the CHI workshop on PIM entitled the Disappearing Desktop, I asked if anyone knew of work in the area and I heard from Leo Sauermann that they had made some progress on this as part of the Gnowsis project. Their proposal for a Desktop URI Scheme (edited by Leo) is targeted principally at the first of the scenarios above, being able to link between things within the desktop.

The second event was at the AVI workshop on designing multi-touch interaction techniques for coupled public and private displays. During discussions abut touch-based interactions such as the Microsoft Surface or Apple iPhone, we considered scenarios where peole got together for a meeting (as we were) in a hotel bar (where we split for small group discussion) and had screens on table tops and walls, laptops, tablets, phones … and wanted to seamlessly move material between devices. Clearly an essential requirement for which is some way to identify resources across ad hoc collections of devices.

Finally I was in Athens working with George Lepouras, Akrivi Katifori and others. George had developed a Thunderbird extension to allow Snip!t to snip from mail messages … but while we could snip the text there was no way for the Snip!t page to link back to the mail message. We need full round trip URIs that link desktop and web with no distinction – URIs that can be embedded in a web page and (assuming you have the right permissions and are in an appropriate place) can be clicked and the appropriate mail message, calendar entry or whatever is opened.

Based on this and discussions we had, I drafted a discussion document on globally accessible local URIs. Any feedback very welcome.

Over the summer we hope to put together a demonstrator / reference implementation – if anyone is interested let me know.

HCI and CSCW – is your usability too small

Recently heard some group feedback on our HCI textbook. Nearly all said that they did NOT want any CSCW. I was appalled as considering any sort of user interaction without its surrounding social and organisational settings seems as fundamentally misbegotten as considering a system without its users.

Has the usability world gone mad or is it just that our conception of HCI has become too narrow?

Continue reading

puzzle with pictures

As it was new Years Day and it was too wet to go shift earth in the garden I thought I’d play a bit with Professor Alan’s puzzle square. I’ve had a ‘make your own’ version for years, but you had to chop an image into bits give them special names, etc. Now it works much more easily with any image (try it yourself). This are a couple I made with my own photos:

needs Javascript   needs Javascript

The key is that it is I am now using the CSS clip property which allows you to show selected parts of an image (or in fact any HTML element). This was made a little more complicated due to the fact that the W3C pages for clip give running examples for every other kind of visual effect … but not clip! Googling was a nightmare as it turns up page after page in forums saying “I can’t get clip to work”!

Happily I found seifi.org (a blog that looks like a really great web resource) and a post on Creating Thumbnails Using the CSS Clip Property. This was full of meticulously laid out examples … Mojo Seifi, you are a star!

Continue reading

Usabilty and Web2.0

Nad did a brilliant guest lecture for our undergraduate HCI class at Lancaster on Monday. His slides and blog about the lecture are at Virtual Chaos. He touched on issues of democracy vs. authority of information, dynamic content vs. accessibility and of course increasing issues of privacy on social networking sites. He also had awesome slides to using loads of Flickr photos under creative commons … community content in action not just words! Of course also touched on Web3.0 and future convergence between emergent community phenomena and structured Semantic Web technologies.