Darwinian markets and sub-optimal AI

Do free markets generate the best AI?  Not always, and this not only hits the bottom line, but comes with costs for personal privacy and the environment.  The combination of network effects and winner-takes-all advertising expenditure means that the resulting algorithms may be worst for everyone.

A few weeks ago I was talking with Maria Ferrario (Queens University Belfast) and Emily Winter (Lancaster University) regarding privacy and personal data.  Social media sites and other platforms are using ever more sophisticated algorithms to micro-target advertising.  However, Maria had recently read a paper suggesting that this had gone beyond the point of diminishing returns: far simpler  – and less personally intrusive – algorithms achieve nearly as good performance as the most complex AI.  As well as having an impact on privacy, this will also be contributing to the ever growing carbon impact of digital technology.

At first this seemed counter-intuitive.  While privacy and the environment may not be central concerns, surely companies will not invest more resources in algorithms than is necessary to maximise profit?

However, I then remembered the peacock tail.


Jatin Sindhu, CC BY-SA 4.0, via Wikimedia Commons
The peacock tail is a stunning wonder of nature.  However, in strict survival terms, it appears to be both flagrantly wasteful and positively dangerous – like eye-catching supermarket packaging for the passing predator.

The simple story of Darwinian selection suggests that this should never happen.  The peacocks that have smaller and less noticeable tails should have a better chance of survival, passing their genes to the next generation, leading over time to more manageable and less bright plumage.  In computational terms, evolution acts as a slow, but effective optimisation algorithm, moving a population ever closer to a perfect fit with its environment.

However, this simple story has twists, notably runaway sexual selection.  The story goes like this.  Many male birds develop brighter plumage during mating season so that they are more noticeable to females.  This carries a risk of being caught by a predator, but there is a balance between the risks of being eaten and the benefits of copulation.  Stronger, faster males are better able to fight off or escape a predator, and hence can afford to have slightly more gaudy plumage.  Thus, for the canny female, brighter plumage is a proxy indicator of a more fit potential mate.  As this becomes more firmly embedded into the female selection process, there is an arms race between males – those with less bright plumage will lose out to those with brighter plumage and hence die out.  The baseline constantly increases.

Similar things can happen in free markets, which are often likened to Darwinian competition.

Large platforms such as Facebook or Google make the majority of their income through advertising.  Companies with large advertising accounts are constantly seeking the best return on their marketing budgets and will place ads on the platform that offers the greatest impact (often measured by click-through) for the least expenditure.  Rather like mating, this is a winner-takes-all choice.  If Facebook’s advertising is 1% more effective than Google’s  a canny advertiser will place all their adverts with Facebook and vice versa.  Just like the peacock there is an existential need to outdo each other and thus almost no limit on the resources that should be squandered to gain that elusive edge.

In practice there are modifying factors; the differing demographics of platforms mean that one or other may be better for particular companies and also, perhaps most critically, the platform can adjust its pricing to reflect the effectiveness so that click-through-per-dollar is similar.

The latter is the way the hidden hand of the free market is supposed to operate to deliver ‘optimal’ productivity.  If spending 10% more on a process can improve productivity by 11% you will make the investment.  However, the theory of free markets (to the extent that it ever works) relies on an ‘ideal’ situation with perfect knowledge, free competition and low barriers to entry.  Many countries operate collusion and monopoly laws in pursuit of this ‘ideal’ market.

Digital technology does not work like this. 

For many application areas, network effects mean that emergent monopolies are almost inevitable.  This was first noticed for software such as Microsoft Office – if all my collaborators use Office then it is easier to share documents with them if I use Office also.  However, it becomes even more extreme with social networks – although there are different niches, it is hard to have multiple Facebooks, or at least to create a new one – the value of the platform is because all one’s friends use it.

For the large platforms this means that a substantial part of their expenditure is based on maintaining and growing this service (social network, search engine, etc.).  While the income is obtained from advertising, only a small proportion of the costs are developing and running the algorithms that micro-target adverts.

Let’s assume that the ratio of platform to advertising algorithm costs is 10:1 (I suspect it is a lot greater).  Now imagine platform P develops an algorithm that uses 50% more computational power, but improves advertising targeting effectiveness by 10%; at first this sounds a poor balance, but remember that 10:1 ratio.

The platform can charge 10% more whilst being competitive.   However, the 50% increase in advertising algorithm costs is just 5% of the overall company running costs, as 90% are effectively fixed costs of maintaining the platform.  A 5% increase in costs has led to a 10% increase in corporate income.  Indeed one could afford to double the computational costs for that 10% increase in performance and still maintain profitability.

Of course, the competing platforms will also work hard to develop ever more sophisticated (and typically privacy reducing and carbon producing) algorithms, so that those gains will be rapidly eroded, leading to the next step.

In the end there are diminishing returns for effective advertising: there are only so many eye-hours and dollars in users’ pockets. The 10% increase in advertising effectiveness is not a real productivity gain, but is about gaining a temporary increase in users’ attention, given the current state of competing platforms’ advertising effectiveness.

Looking at the system as a whole, more and more energy and expenditure are spent on algorithms that are ever more invasive of personal autonomy, and in the end yield no return for anyone.

And it’s not even a beautiful extravagance.

If the light is on, they can hear (and now see) you

hello-barbie-matel-from-guardianFollowing Samsung’s warning that its television sets can listen into your conversations1, and Barbie’s, even more scary, doll that listens to children in their homes and broadcasts this to the internet2, the latest ‘advances’ make it possible to be seen even when the curtains are closed and you thought you were private.

For many years it has been possible for security services, or for that matter sophisticated industrial espionage, to pick up sounds based on incandescent light bulbs.

The technology itself is not that complicated, vibrations in the room are transmitted to the filament, which minutely changes its electrical characteristics. The only complication is extracting the high-frequency signal from the power line.

040426-N-7949W-007However, this is a fairly normal challenge for high-end listening devices. Years ago when I was working with submarine designers at Slingsby, we were using the magnetic signature of power running through undersea cables to detect where they were for repair. The magnetic signatures were up to 10,000 times weaker than the ‘noise’ from the Earth’s own magnetic field, but we were able to detect the cables with pin-point accuracy3. Military technology for this is far more advanced.

The main problem is the raw computational power needed to process the mass of data coming from even a single lightbulb, but that has never been a barrier for GCHQ or the NSA, and indeed, with cheap RaspberryPi-based super-computers, now not far from the hobbyist’s budget4.

Using the fact that each lightbulb reacts slightly differently to sound, means that it is, in principle, possible to not only listen into conversations, but work out which house and room they come from by simply adding listening equipment at a neighbourhood sub-station.

The benefits of this to security services are obvious. Whereas planting bugs involves access to a building, and all other techniques involve at least some level of targeting, lightbulb-based monitoring could simply be installed, for example, in a neighbourhood known for extremist views and programmed to listen for key words such as ‘explosive’.

For a while, it seemed that the increasing popularity of LED lightbulbs might end this. This is not because LEDs do not have an electrical response to vibrations, but because of the 12V step down transformers between the light and the mains.

Of course, there are plenty of other ways to listen into someone in their home, from obvious bugs to laser-beams bounced of glass (you can even get plans to build one of your own at Instructables), or even, as MIT researchers recently demonstrated at SIGGRAPH, picking up the images of vibrations on video of a glass of water, a crisp packet, and even the leaves of a potted plant5. However, these are all much more active and involve having an explicit suspect.

Similarly blanket internet and telephone monitoring have applications, as was used for a period to track Osama bin Laden’s movements6, but net-savvy terrorists and criminals are able to use encryption or bypass the net entirely by exchanging USB sticks.

However, while the transformer attenuates the acoustic back-signal from LEDs, this only takes more sensitive listening equipment and more computation, a lot easier than a vibrating pot-plant on video!

So you might just think to turn up the radio, or talk in a whisper. Of course, as you’ve guessed by now, and, as with all these surveillance techniques, simply yet more computation.

Once the barriers of LEDs are overcome, they hold another surprise. Every LED not only emits light, but acts as a tiny, albeit inefficient, light detector (there’s even an Arduino project to use this principle).   The output of this is a small change in DC current, which is hard to localise, but ambient sound vibrations act as a modulator, allowing, again in principle, both remote detection and localisation of light.

220px-60_LED_3W_Spot_Light_eq_25WIf you have several LEDs, they can be used to make a rudimentary camera7. Each LED lightbulb uses a small array of LEDs to create a bright enough light. So, this effectively becomes a very-low-resolution video camera, a bit like a fly’s compound eye.

While each image is of very low quality, any movement, either of the light itself (hanging pendant lights are especially good), or of objects in the room, can improve the image. This is rather like the principle we used in FireFly display8, where text mapped onto a very low-resolution LED pixel display is unreadable when stationary, but absolutely clear when moving.

pix-11  pix-21
pix-12  pix-22
LEDs produce multiple very-low-resolution image views due to small vibrations and movement9.

OLYMPUS DIGITAL CAMERA  OLYMPUS DIGITAL CAMERA
Sufficient images and processing can recover an image.

So far MI5 has not commented on whether it uses, or plans to use this technology itself, nor whether it has benefited from information gathered using it by other agencies. Of course its usual response is to ‘neither confirm nor deny’ such things, so without another Edward Snowden, we will probably never know.

So, next time you sit with a coffee in your living room, be careful what you do, the light is watching you.

  1. Not in front of the telly: Warning over ‘listening’ TV. BBC News, 9 Feb 2015. http://www.bbc.co.uk/news/technology-31296188[back]
  2. Privacy fears over ‘smart’ Barbie that can listen to your kids. Samuel Gibbs, The Guardian, 13 March 2015. http://www.theguardian.com/technology/2015/mar/13/smart-barbie-that-can-listen-to-your-kids-privacy-fears-mattel[back]
  3. “Three DSP tricks”, Alan Dix, 1998. https://alandix.com/academic/papers/DSP99/DSP99-full.html[back]
  4. “Raspberry Pi at Southampton: Steps to make a Raspberry Pi Supercomputer”, http://www.southampton.ac.uk/~sjc/raspberrypi/[back]
  5. A. Davis, M. Rubinstein, N. Wadhwa, G. Mysore, F. Durand and W. Freeman (2014). The Visual Microphone: Passive Recovery of Sound from Video. ACM Transactions on Graphics (Proc. SIGGRAPH), 33(4):79:1–79:10 http://people.csail.mit.edu/mrub/VisualMic/[back]
  6. Tracking Use of Bin Laden’s Satellite Phone, all Street Journal, Evan Perez, Wall Street Journal, 28th May, 2008. http://blogs.wsj.com/washwire/2008/05/28/tracking-use-of-bin-ladens-satellite-phone/[back]
  7. Blinkenlight, LED Camera. http://blog.blinkenlight.net/experiments/measurements/led-camera/[back]
  8. Angie Chandler, Joe Finney, Carl Lewis, and Alan Dix. 2009. Toward emergent technology for blended public displays. In Proceedings of the 11th international conference on Ubiquitous computing (UbiComp ’09). ACM, New York, NY, USA, 101-104. DOI=10.1145/1620545.1620562[back]
  9. Note using simulated images; getting some real ones may be my next Tiree Tech Wave project.[back]

big brother Is watching … but doing it so, so badly

I followed a link to an article on Forbes’ web site1.  After a few moments the computer fan started to spin like a merry-go-round and the page, and the browser in general became virtually unresponsive.

I copied the url, closed the browser tab (Firefox) and pasted the link into Chrome, as Chrome is often billed for its stability and resilience to badly behaving web pages.  After a  few moments the same thing happened, roaring fan, and, when I peeked at the Activity Monitor, Chrome was eating more than a core worth of the machine’s CPU.

I dug a little deeper and peeked at the web inspector.  Network activity was haywire hundreds and hundreds of downloads, most were small, some just a  few hundred bytes, others a few Kb, but loads of them.  I watched mesmerised.  Eventually it began to level off after about 10 minutes when the total number of downloads was nearing 1700 and 8Mb total download.

 

It is clear that the majority of these are ‘beacons’, ‘web bugs’, ‘trackers’, tiny single pixel images used by various advertising, trend analysis and web analytics companies.  The early beacons were simple gifs, so would download once and simply tell the company what page you were on, and hence using this to tune future advertising, etc.

However, rather than simply images that download once, clearly many of the current beacons are small scripts that then go on to download larger scripts.  The scripts they download then periodically poll back to the server.  Not only can they tell their originating server that you visited the page, but also how long you stayed there.  The last url on the screenshot above is one of these report backs rather than the initial download; notice it telling the server what the url of the current page is.

Some years ago I recall seeing a graphic showing how many of these beacons common ‘quality’ sites contained – note this is Forbes.  I recall several had between one and two hundred on a single page.  I’m not sure the actual count here as each beacon seems to create very many hits, but certainly enough to create 1700 downloads in 10 minutes.  The chief culprits, in terms of volume, seemed to be two companies I’d not heard of before SimpleReach2 and Realtime3, but I also saw Google, Doubleclick and others.

While I was not surprised that these existed, the sheer volume of activity did shock me, consuming more bandwidth than the original web page – no wonder your data allowance disappears so fast on a mobile!

In addition the size of the JavaScript downloads suggests that there are doing more than merely report “page active”, I’m guessing tracking scroll location, mouse movement, hover time … enough to eat a whole core of CPU.

I left the browser window and when I returned, around an hour later, the activity had slowed down, and only a couple of the sites were still actively polling.  The total bandwidth had climbed another 700Kb, so around 10Kb/minute – again think about mobile data allowance, this is a web page that is just sitting there.

When I peeked at the activity monitor Chrome had three highly active processes, between them consuming 2 cores worth of CPU!  Again all on a web page that is just sitting there.  Not only are these web beacons spying on your every move, but they are badly written to boot, costuming vast amounts of CPU when there is nothing happening.

I tried to scroll the page and then, surprise, surprise:

So, I will avoid links to Forbes in future, not because I respect my privacy; I already know I am tracked and tracked; who needed Snowdon to tell you that?  I won’t go because the beacons make the site unusable.

I’m guessing this is partly because the network here on Tiree is slow.  It does not take 10 minutes to download 8Mb, but the vast numbers of small requests interact badly with the network characteristics.  However, this is merely exposing what would otherwise be hidden: the vast ratio between useful web page and tracking software, and just how badly written the latter is.

Come on Forbes, if you are going to allow spies to pay to use your web site, at least ask them to employ some competent coders.

  1. The page I was after was this one, but I’d guess any news page would be the same. http://www.forbes.com/sites/richardbehar/2014/08/21/the-media-intifada-bad-math-ugly-truths-about-new-york-times-in-israel-hamas-war/[back]
  2. http://www.simplereach.com/[back]
  3. http://www.realtime.co/[back]

web ephemera and web privacy

Yesterday I was twittering about a web page I’d visited on the BBC1 and the tweet also became my Facebook status2.  Yanni commented on it, not because of the content of the link, but because he noticed the ‘is.gd’ url was very compact.  Thinking about this has some interesting implications for privacy/security and the kind of things you might to use different url shortening schemes for, but also led me to develop an interesting time-wasting application ‘LuckyDip‘ (well if ‘develop’ is the right word as it was just 20-30 mins hacking!).

I used the ‘is.gd’ shortening because it was one of three schemes offered by twirl, the twitter client I use.  I hadn’t actually noticed that it was significantly shorter than the others or indeed tinyurl, which is what I might have thought of using without twirl’s interface.

Here is the url of this blog <http://www.alandix.com/blog/> shortened by is.gd and three other services:

snurl:   http://snurl.com/5ot5k
twurl:  http://twurl.nl/ftgrwl
tinyurl:  http://tinyurl.com/5j98ao
is.gd:  http://is.gd/7OtF

The is.gd link is small for two reasons:

  1. ‘is.gd’ is about as short as you can get with a domain name!
  2. the ‘key’ bit after the domain is only four characters as opposed to 5 (snurl) or 6 (twurl, tinyurl)

The former is just clever domain choice, hard to get something short at all, let alone short and meaningful3.

The latter however is as a result of a design choice at is.gd.  The is.gd urls are allocated sequentially, the ‘key’ bit (7OtF) is simply an encoding of the sequence number that was allocated.  In contrast tinyurl seems to do some sort of hash either of the address or maybe of a sequence number.

The side effect of this is that if you simply type in a random key (below the last allocated sequence number) for an is.gd url it will be a valid url.  In contrast, the space of tinyurl is bigger, so ‘in principle’ only about one in a hundred keys will represent real pages … now I say ‘in principle’ because experimenting with tinyurl I find every six character seqeunce I type as a key gets me to a valid page … so maybe they do some sort of ‘closest’ match.

Whatever url shortening scheme you use by their nature the shorter url will be less redundant than a full url – more ‘random’ permutations will represent meaningful items.  This is a natural result of any ‘language’, the more concise you are the less redundant the language.

At a practical level this means that if you use a shortened url, it is more likely that someone  typing in a random is.gd (or tinyurl) key will come across your page than if they just type a random url.  Occasionally I upload large files I want to share to semi-private urls, ones that are publicly available, but not linked from anywhere.  Because they are not linked they cannot be found through search engines and because urls are long it would be highly unlikely that someone typing randomly (or mistyping) would find them.

If however, I use url shortening to tell someone about it, suddenly my semi-private url becomes a little less private!

Now of course this only matters if people are randomly typing in urls … and why would they do such a thing?

Well a random url on the web is not very interesting in general, there are 100s of millions and most turn out to be poor product or hotel listing sites.  However, people are only likely to share interesting urls … so random choices of shortened urls are actually a lot more interesting than random web pages.

So, just for Yanni, I spent a quick 1/2 hour4 and made a web page/app ‘LuckyDip‘.  This randomly chooses a new page from is.gd every 20 seconds – try it!


successive pages from LuckyDip

Some of the pages are in languages I can’t read, occasionally you get a broken link, and the ones that are readable, are … well … random … but oddly compelling.  They are not the permanently interesting pages you choose to bookmark for later, but the odd page you want to send to someone … often trivia, news items, even (given is.gd is in a twitter client) the odd tweet page on the twitter site.  These are not like the top 20 sites ever, but the ephemera of the web – things that someone at some point thought worth sharing, like overhearing the odd raised voice during a conversation in a train carriage.

Some of the pages shown are map pages, including ones with addresses on … it feels odd, voyeuristic, web curtain twitching – except you don’t know the person, the reason for the address; so maybe more like sitting watching people go by in a crowded town centre, a child cries, lovers kiss, someone’s newspaper blows away in the wind … random moments from unknown lives.

In fact most things we regard as private are not private from everyone.  It is easy to see privacy like an onion skin with the inner sanctum, then those further away, and then complete strangers – the further away someone is from ‘the secret’ the more private something is.  This is certainly the classic model in military security.  However, think further and there are many things you would be perfectly happy for a complete stranger to know, but maybe not those a little closer, your work colleagues, your commercial competitors.  The onion sort of reverses, apart from those that you explicitly want to know, in fact the further out of the onion, the safer it is.  Of course this can go wrong sometimes, as Peter Mandleson found out chatting to a stranger in a taverna (see BBC blog).

So I think LuckyDip is not too great a threat to the web’s privacy … but do watch out what you share with short urls … maybe the world needs a url lengthening service too …

And as a postscript … last night I was trying out the different shortening schemes available from twirl, and accidentally hit return, which created a tweet with the ‘test’ short url in it.  Happily you can delete tweets, and so I thought I had eradicated the blunder unless any twitter followers happened to be watching at that exact moment … but I forgot that my twitter feed also goes to my Facebook status and that deleting the tweet on twitter did not remove the status, so overnight the slip was my Facebook status and at least one person noticed.

On the web nothing stays secret long, and if anything is out there, it is there for ever … and will come back to hant you someday.

  1. This is the tweet “Just saw http://is.gd/7Irv Sad state of the world is that it took me several paragraphs before I realised it was a joke.”[back]
  2. I managed to link them up some time ago, but cannot find again the link on twitter that enabled this, so would be stuck if I wanted to stop it![back]
  3. anyone out there registering Bangaldeshi domains … if ‘is’ is available!![back]
  4. yea it should ave been less, but I had to look up how to access frames in javascript, etc.[back]