web ephemera and web privacy

Yesterday I was twittering about a web page I’d visited on the BBC1 and the tweet also became my Facebook status2.  Yanni commented on it, not because of the content of the link, but because he noticed the ‘is.gd’ url was very compact.  Thinking about this has some interesting implications for privacy/security and the kind of things you might to use different url shortening schemes for, but also led me to develop an interesting time-wasting application ‘LuckyDip‘ (well if ‘develop’ is the right word as it was just 20-30 mins hacking!).

I used the ‘is.gd’ shortening because it was one of three schemes offered by twirl, the twitter client I use.  I hadn’t actually noticed that it was significantly shorter than the others or indeed tinyurl, which is what I might have thought of using without twirl’s interface.

Here is the url of this blog <http://www.alandix.com/blog/> shortened by is.gd and three other services:

snurl:   http://snurl.com/5ot5k
twurl:  http://twurl.nl/ftgrwl
tinyurl:  http://tinyurl.com/5j98ao
is.gd:  http://is.gd/7OtF

The is.gd link is small for two reasons:

  1. ‘is.gd’ is about as short as you can get with a domain name!
  2. the ‘key’ bit after the domain is only four characters as opposed to 5 (snurl) or 6 (twurl, tinyurl)

The former is just clever domain choice, hard to get something short at all, let alone short and meaningful3.

The latter however is as a result of a design choice at is.gd.  The is.gd urls are allocated sequentially, the ‘key’ bit (7OtF) is simply an encoding of the sequence number that was allocated.  In contrast tinyurl seems to do some sort of hash either of the address or maybe of a sequence number.

The side effect of this is that if you simply type in a random key (below the last allocated sequence number) for an is.gd url it will be a valid url.  In contrast, the space of tinyurl is bigger, so ‘in principle’ only about one in a hundred keys will represent real pages … now I say ‘in principle’ because experimenting with tinyurl I find every six character seqeunce I type as a key gets me to a valid page … so maybe they do some sort of ‘closest’ match.

Whatever url shortening scheme you use by their nature the shorter url will be less redundant than a full url – more ‘random’ permutations will represent meaningful items.  This is a natural result of any ‘language’, the more concise you are the less redundant the language.

At a practical level this means that if you use a shortened url, it is more likely that someone  typing in a random is.gd (or tinyurl) key will come across your page than if they just type a random url.  Occasionally I upload large files I want to share to semi-private urls, ones that are publicly available, but not linked from anywhere.  Because they are not linked they cannot be found through search engines and because urls are long it would be highly unlikely that someone typing randomly (or mistyping) would find them.

If however, I use url shortening to tell someone about it, suddenly my semi-private url becomes a little less private!

Now of course this only matters if people are randomly typing in urls … and why would they do such a thing?

Well a random url on the web is not very interesting in general, there are 100s of millions and most turn out to be poor product or hotel listing sites.  However, people are only likely to share interesting urls … so random choices of shortened urls are actually a lot more interesting than random web pages.

So, just for Yanni, I spent a quick 1/2 hour4 and made a web page/app ‘LuckyDip‘.  This randomly chooses a new page from is.gd every 20 seconds – try it!


successive pages from LuckyDip

Some of the pages are in languages I can’t read, occasionally you get a broken link, and the ones that are readable, are … well … random … but oddly compelling.  They are not the permanently interesting pages you choose to bookmark for later, but the odd page you want to send to someone … often trivia, news items, even (given is.gd is in a twitter client) the odd tweet page on the twitter site.  These are not like the top 20 sites ever, but the ephemera of the web – things that someone at some point thought worth sharing, like overhearing the odd raised voice during a conversation in a train carriage.

Some of the pages shown are map pages, including ones with addresses on … it feels odd, voyeuristic, web curtain twitching – except you don’t know the person, the reason for the address; so maybe more like sitting watching people go by in a crowded town centre, a child cries, lovers kiss, someone’s newspaper blows away in the wind … random moments from unknown lives.

In fact most things we regard as private are not private from everyone.  It is easy to see privacy like an onion skin with the inner sanctum, then those further away, and then complete strangers – the further away someone is from ‘the secret’ the more private something is.  This is certainly the classic model in military security.  However, think further and there are many things you would be perfectly happy for a complete stranger to know, but maybe not those a little closer, your work colleagues, your commercial competitors.  The onion sort of reverses, apart from those that you explicitly want to know, in fact the further out of the onion, the safer it is.  Of course this can go wrong sometimes, as Peter Mandleson found out chatting to a stranger in a taverna (see BBC blog).

So I think LuckyDip is not too great a threat to the web’s privacy … but do watch out what you share with short urls … maybe the world needs a url lengthening service too …

And as a postscript … last night I was trying out the different shortening schemes available from twirl, and accidentally hit return, which created a tweet with the ‘test’ short url in it.  Happily you can delete tweets, and so I thought I had eradicated the blunder unless any twitter followers happened to be watching at that exact moment … but I forgot that my twitter feed also goes to my Facebook status and that deleting the tweet on twitter did not remove the status, so overnight the slip was my Facebook status and at least one person noticed.

On the web nothing stays secret long, and if anything is out there, it is there for ever … and will come back to hant you someday.

  1. This is the tweet “Just saw http://is.gd/7Irv Sad state of the world is that it took me several paragraphs before I realised it was a joke.”[back]
  2. I managed to link them up some time ago, but cannot find again the link on twitter that enabled this, so would be stuck if I wanted to stop it![back]
  3. anyone out there registering Bangaldeshi domains … if ‘is’ is available!![back]
  4. yea it should ave been less, but I had to look up how to access frames in javascript, etc.[back]