tread lightly — controlling user experience pollution

When thinking about usability or user experience, it is easy to focus on the application in front of us, but the way it impacts its environment may sometimes be far more critical. However, designing applications that are friendly to their environment (digital and physical) may require deep changes to the low-level operating systems.

I’m writing this post effectively ‘offline’ into a word processor for later upload. I sometimes do this as I find it easier to write without the distractions of editing within a web browser, or because I am physically disconnected from the Internet. However, now I am connected, and indeed I can see I am connected as a FTP file upload is progressing, it is just that anything else network-related is stalled.

The reason that the FTP upload is ‘hogging’ the network is, I believe, due to a quirk in the UNIX scheduling system, which was, paradoxically, originally intended to improve interactivity.

UNIX, which sits underneath Mac OS, is a multiprocessing operating system running many programs at once. Each process has a priority, called its ‘niceness‘, which can be set explicitly, but is also tweaked from moment to moment by the operating system. One of the rules for ‘tweaking’ it is that if a process is IO-bound, that is if it is constantly waiting for input or output, then its niceness is decreased, meaning that it is given higher priority.

The reason for this rule is partly to enhance interactive performance in the old days of command line interfaces; an interactive program would spend lots of time waiting for the user to enter something, and so its priority would increase meaning it would respond quickly as soon as the user entered anything. The other reason is that CPU time was seen as the scarce resource, so that processes that were IO bound were effectively being ‘nicer’ to other processes as they let them get a share of the precious CPU.

The FTP program is simply sitting there shunting out data to the network, so is almost permanently blocked waiting for the network as it can read from the disk faster than the network can transmit data. This means UNIX regards it as ‘nice’ and ups its priority. As soon as the network clears sufficiently, the FTP program is rescheduled and it puts more into the network queue, reads the next chunk from disk until the network is again full to capacity. Nothing else gets a chance, no web, no email, not even a network trace utility.

I’ve seen the same before with a database server on one of Fiona’s machines — all my fault. In the MySQL manual it suggested that you disable indices before large bulk updates (e.g. ingesting a file of data) and then re-enable them once the update is finished as indexing is more efficient on lots of data than one at a time. I duly did this and forgot about it until Fiona noticed something was wrong on the server and web traffic had ground to a near halt. When she opened a console on the server, she found that it seemed quiet, very little CPU load at all, and was puzzled until I realised it was my indexing. Indexing requires a lot of reading and writing data to and from disk, so MySQL became IO-bound, was given higher priority, as soon as the disk was free it was rescheduled, hit the disk once more … just as FTP is now hogging the network, MySQL hogged the disk and nothing else could read or write. Of course MySQL’s own performance was fine as it internally interleaved queries with indexing, it is just everything else on the system that failed.

These are hard scenarios to design for. I have written before (“why software need never hang“) about the way application designers do not think sufficiently about potential delays due to slow networks, or broken connections. However, that was about the applications that are suffering. Here the issue is not that the FTP program is badly designed for its delays, it is still responding very happily, just that it has had a knock on effect on the rest of the system. It is like cleaning your sink with industrial bleach — you have a clean house within, but pollute the watercourse without.

These kind of issues are not related solely to network and disk, any kind of resource is limited and profligacy causes damage in the digital world as much as in the physical environment.

Some years ago I had a Symbian smartphone, but it proved unusable as its battery life rarely exceeded 40 minutes from full charge. I thought I had a duff battery, but later realised it was because I was leaving applications on the phone ‘open’. For me I went to the address book, looked up a number, and that was that, I then maybe turned the phone off or switched  to something else without ‘exiting’ the address book. I was treating the phone like every previous phone I had used, but this one was different, it had a ‘real’ operating system, opening the address book launched the address book application, which then kept on running — and using power — until it was explicitly closed, a model that is maybe fine for permanently plugged in computers, but disastrous for a moble phone.

When early iPhones came out iOS was criticised for being single threaded, that is not having lots of things running in the ‘background’. However, this undoubtedly helped its battery life. Now, with newer versions of iOS, it has changed and there are lots of apps running at once, and I have noticed the battery life reducing, is that simply the battery wearing out with age or the effect of all those apps running?

Power is of course not just a problem for smartphones, but for any laptop. I try to closedown applications on my Mac when I am working without power as I know some programs just eat CPU when they are apparently idle (yes, Firefox, it’s you I’m talking about). And from an environmental point of view, lower power consumption when connected would also be good. My hope was that Apple would take the lessons learnt in the early iOS to change the nature of their mainstream OS, but sadly they succumbed to the pressure to make iOS a ‘proper’ OS!

Of course the FTP program could try to be friendly, perhaps when it is not the selected window deliberately throttle its network activity. But then the 4 hour upload would take 8 hours, instead of 20 minutes left at this point, I’d be looking forward to another 4 hours and 20 minutes, and I’d be complaining about that.

The trouble is that there needs to be better communication, more knowledge shared, between application and operating system. I would like FTP to use all the network capacity that it can, except when I am interacting with some other program. Either FTP needs to say to the OS “hey here’s a packet, send it when there’s a gap”1, or the OS needs some way for applications to determine current network state and make decisions based on that. Sometimes this sort of information is easily available, more often it is either very hard to get at or not available at all.

I recall years ago when internet was still mainly through pay-per-minute dial-up connections. You could set your PC to automatically dial when the internet was needed. However, some programs, such as chat, would periodically check with a central server to see if there was activity, this would cause the PC to dial-up the ISP. If you were lucky the PC also had an auto-disconnect after a period of inactivity, if you were not lucky the PC would connect at 2am and by the morning you’d find yourself with a phone bill more than your weeks’ wages.

When we were designing onCue at aQtive, we wanted to be able to connect to the Internet when it was available, but avoid bankrupting our users. Clearly somewhere in the TCP/IP stack, the layers of code over the network, at some level deep down it knew whether we were connected. I recall we found a very helpful function in the Windows API called something like “isConnected”2. Unfortunately, it worked by attempting to send a network packet and returning true if it succeeded and false if it failed. Of course sending the test packet caused the PC to auto-dial …

And now there is just 1 minute and 53 seconds left on the upload, so time to finish this post before I get on to garbage collection.

  1. This form of “send when you can” would also be useful in cellular networks, for example when syncing photos.[back]
  2. I had a quick peek, and fund that Windows CE has a function called InternetGetConnectedState.  I don’t know if this works better now.[back]