Tag Archives: web
Timing matters!
How long is an instant? The answer, of course, is ‘it depends’, but I’ve been finding it fascinating playing on the demo page for AngularJS tooltips. and seeing what feels like ‘instant’ for a tooltip.
The demo allows you to adjust the md-delay property so you can change the delay between hovering over a button and the tooltip appearing, and then instantly see what that feels like.
Solr Rocks!
After struggling with large FULLTEXT indexes in MySQL, Solr comes to the rescue, 16 million records ingested in 20 minutes – wow!
One small Gotcha was the security classes, which have obviously moved since the documentation was written (see fix at end of the post).
For web apps I live off MySQL, albeit now-a-days often wrapped with my own NoSQLite libraries to do Mongo-style databases over the LAMP stack. I’d also recently had a successful experience using MySQL FULLTEXT indices with a smaller database (10s of thousands of records) for the HCI Book search. So when I wanted to index 16 million the book titles with their author names from OpenLibrary I thought I might as well have a go.
For some MySQL table types, the normal recommendation used to be to insert records without an index and add the index later. However, in the past I have had a very bad experience with this approach as there doesn’t appear to be a way to tell MySQL to go easy with this process – I recall the disk being absolutely thrashed and Fiona having to restart the web server 🙁
Happily, Ernie Souhrada reports that for MyISAM tables incremental inserts with an index are no worse than bulk insert followed by adding the index. So I went ahead and set off a script adding batches of a 10,000 records at a time, with small gaps ‘just in case’. The just in case was definitely the case and 16 hours later I’d barely managed a million records and MySQL was getting slower and slower.
I cut my losses, tried an upload without the FULLTEXT index and 20 minutes later, that was fine … but no way could I dare doing that ‘CREATE FULLTEXT’!
In my heart I knew that lucene/Solr was the right way to go. These are designed for search engine performance, but I dreaded the pain of trying to install and come up to speed with yet a different system that might not end up any better in the end.
However, I bit the bullet, and my dread was utterly unfounded. Fiona got the right version of Java running and then within half an hour of downloading Solr I had it up and running with one of the examples. I then tried experimental ingests with small chunks of the data: 1000 records, 10,000 records, 100,000 records, a million records … Solr lapped it up, utterly painless. The only fix I needed was because my tab-separated records had quote characters that needed mangling.
So, a quick split into million record chunks (I couldn’t bring myself to do a single multi-gigabyte POST …but maybe that would have been OK!), set the ingest going and 20 minutes later – hey presto 16 million full text indexed records 🙂 I then realised I’d forgotten to give fieldnames, so the ingest had taken the first record values as a header line. No problems, just clear the database and re-ingest … at 20 minutes for the whole thing, who cares!
As noted there was one slight gotcha. In the Securing Solr section of the Solr Reference guide, it explains how to set up the security.json file. This kept failing until I realised it was failing to find the classes solr.BasicAuthPlugin and solr.RuleBasedAuthorizationPlugin (solr.log is your friend!). After a bit of listing of contents of jars, I found tat these are now in org.apache.solr.security. I also found that the JSON parser struggled a little with indents … I think maybe tab characters, but after explicitly selecting and then re-typing spaces yay! – I have a fully secured Solr instance with 16 million book titles – wow 🙂
This is my final security.json file (actual credentials obscured of course!
{ "authentication":{ "blockUnknown": true, "class":"org.apache.solr.security.BasicAuthPlugin", "credentials":{ "tom":"blabbityblabbityblabbityblabbityblabbityblo= blabbityblabbityblabbityblabbityblabbityblo=", "dick":"blabbityblabbityblabbityblabbityblabbityblo= blabbityblabbityblabbityblabbityblabbityblo=", "harry":"blabbityblabbityblabbityblabbityblabbityblo= blabbityblabbityblabbityblabbityblabbityblo="}, }, "authorization":{"class":"org.apache.solr.security.RuleBasedAuthorizationPlugin"} }
big brother Is watching … but doing it so, so badly
I followed a link to an article on Forbes’ web site1. After a few moments the computer fan started to spin like a merry-go-round and the page, and the browser in general became virtually unresponsive.
I copied the url, closed the browser tab (Firefox) and pasted the link into Chrome, as Chrome is often billed for its stability and resilience to badly behaving web pages. After a few moments the same thing happened, roaring fan, and, when I peeked at the Activity Monitor, Chrome was eating more than a core worth of the machine’s CPU.
I dug a little deeper and peeked at the web inspector. Network activity was haywire hundreds and hundreds of downloads, most were small, some just a few hundred bytes, others a few Kb, but loads of them. I watched mesmerised. Eventually it began to level off after about 10 minutes when the total number of downloads was nearing 1700 and 8Mb total download.
It is clear that the majority of these are ‘beacons’, ‘web bugs’, ‘trackers’, tiny single pixel images used by various advertising, trend analysis and web analytics companies. The early beacons were simple gifs, so would download once and simply tell the company what page you were on, and hence using this to tune future advertising, etc.
However, rather than simply images that download once, clearly many of the current beacons are small scripts that then go on to download larger scripts. The scripts they download then periodically poll back to the server. Not only can they tell their originating server that you visited the page, but also how long you stayed there. The last url on the screenshot above is one of these report backs rather than the initial download; notice it telling the server what the url of the current page is.
Some years ago I recall seeing a graphic showing how many of these beacons common ‘quality’ sites contained – note this is Forbes. I recall several had between one and two hundred on a single page. I’m not sure the actual count here as each beacon seems to create very many hits, but certainly enough to create 1700 downloads in 10 minutes. The chief culprits, in terms of volume, seemed to be two companies I’d not heard of before SimpleReach2 and Realtime3, but I also saw Google, Doubleclick and others.
While I was not surprised that these existed, the sheer volume of activity did shock me, consuming more bandwidth than the original web page – no wonder your data allowance disappears so fast on a mobile!
In addition the size of the JavaScript downloads suggests that there are doing more than merely report “page active”, I’m guessing tracking scroll location, mouse movement, hover time … enough to eat a whole core of CPU.
I left the browser window and when I returned, around an hour later, the activity had slowed down, and only a couple of the sites were still actively polling. The total bandwidth had climbed another 700Kb, so around 10Kb/minute – again think about mobile data allowance, this is a web page that is just sitting there.
When I peeked at the activity monitor Chrome had three highly active processes, between them consuming 2 cores worth of CPU! Again all on a web page that is just sitting there. Not only are these web beacons spying on your every move, but they are badly written to boot, costuming vast amounts of CPU when there is nothing happening.
I tried to scroll the page and then, surprise, surprise:
So, I will avoid links to Forbes in future, not because I respect my privacy; I already know I am tracked and tracked; who needed Snowdon to tell you that? I won’t go because the beacons make the site unusable.
I’m guessing this is partly because the network here on Tiree is slow. It does not take 10 minutes to download 8Mb, but the vast numbers of small requests interact badly with the network characteristics. However, this is merely exposing what would otherwise be hidden: the vast ratio between useful web page and tracking software, and just how badly written the latter is.
Come on Forbes, if you are going to allow spies to pay to use your web site, at least ask them to employ some competent coders.
JavaScript gotcha: var scope
I have been using JavaScript for more than 15 years with some projects running to several thousand lines. But just discovered that for all these years I have misunderstood the scope rules for variables. I had assumed they were block scoped, but in fact every variable is effectively declared at the beginning of the function.
So if you write:
function f() { for( var i=0; i<10; i++ ){ var i_squared = i * i; // more stuff ... } }
This is treated as if you had written:
function f() { var i, i_squared for( i=0; i<10; i++ ){ i_squared = i * i; // more stuff ... } }
The Mozilla Developer Network describes the basic principle in detail, however, does not include any examples with inner blocks like this.
So, there is effectively a single variable that gets reused every time round the loop. Given you do the iterations one after another this is perfectly fine … until you need a closure.
I had a simple for loop:
function f(items) for( var ix in items ){ var item = items[ix]; var value = get_value(item) do_something(item,value); } }
This all worked well until I needed to get the value asynchronously (AJAX call) and so turned get_value into an asynchronous function:
get_value_async(item,callback)
which fetches the value and then calls callback(value) when it is ready.
The loop was then changed to
function f(items) for( var ix in items ){ var item = items[ix]; get_value_async( item, function(value) { do_something(item,value); }; ); } }
I had assumed that ‘item’ in each callback closure would be bound to the value for the particular iteration of the loop, but in fact the effective code is:
function f(items) var ix, item; for( ix in items ){ item = items[ix]; get_value_async( item, function(value) { do_something(item,value); }; ); } }
So all the callbacks point to the same ‘item’, which ends up as the one from the last iteration. In this case the code is updating an onscreen menu, so only the last item got updated!
JavaScript 1.7 and ECMAScript 6 have a new ‘let’ keyword, which has precisely the semantics that I have always thought ‘var’ had, but does not seem to widely available yet in browsers.
As a workaround I have used the slightly hacky looking:
function f(items) for( var ix in items ){ (function() { var item = items[ix]; get_value_async( item, function(value) { do_something(item,value); }; ); })(); } }
The anonymous function immediately inside the for loop is simply there to create scope for the item variable, and effectively means there is a fresh variable to be bound to the innermost function.
It works, but you do need to be confident with anonymous functions!
Offline HTML5, Chrome, and infinite regress
I am using HTML5’s offline mode as part of the Tiree Mobile Archive project.
This is, in principle, a lovely way of creating web sites that behave pretty much like native apps on mobile devices. However, things, as you can guess, do not always go as smoothly as the press releases and blogs suggest!
Some time I must write at length on various useful lessons, but, for now, just one – the potential for an endless cycle of caches, rather like Jörmungandr, the Norse world serpent, that wraps around the world swallowing its own tail.
My problem started when I had a file (which I will call ‘shared.prob’ below, but was actually ‘place_data.js’), which I had updated on the web server, but kept showing an old version on Chrome no matter how many times I hit refresh and even after I went to the history settings and asked chrome to empty its cache.
I eventually got to the bottom of this and it turned out to be this Jörmungandr, cache-eats-cache, problem (browser bug!), but I should start at the beginning …
To make a web site work off-line in HTML5 you simply include a link to an application cache manifest file in the main file’s <html> tag. The browser then pre-loads all of the files mentioned in the manifest to create the application cache (appCache for short). The site is then viewable off-line. If this is combined with off-line storage using the built-in SQLite database, you can have highly functional applications, which can sync to central services using AJAX when connected.
Of course sometimes you have updated files in the site and you would like browsers to pick up the new version. To do this you simply update the files, but then also update the manifest file in some way (often updating a version number or date in a comment). The browser periodically checks the manifest file when it is next connected (or at least some browsers check themselves, for some you need to add Javascript code to do it), and then when it notices the manifest has changed it invalidates the appCache and rechecks all the files mentioned in the manifest, downloading the new versions.
Great, your web site becomes an off-line app and gets automatically updated 🙂
Of course as you work on your site you are likely to end up with different versions of it. Each version has its own main html file and manifest giving a different appCache for each. This is fine, you can update the versions separately, and then invalidate just the one you updated – particularly useful if you want a frozen release version and a development version.
Of course there may be some files, for example icons and images, that are relatively static between versions, so you end up having both manifest files mentioning the same file. This is fine so long as the file never changes, but, if you ever do update that shared file, things get very odd indeed!
I will describe Chrome’s behaviour as it seems particularly ‘aggressive’ at caching, maybe because Google are trying to make their own web apps more efficient.
First you update the shared file (let’s call it shared.prob), then invalidate the two manifest files by updating them.
Next time you visit the site for appCache_1 Chrome notices that manifest_1 has been invalidated, so decides to check whether the files in the manifest need updating. When it gets to shared.prob it is about to go to the web to check it, then notices it is in appCache_2 – so uses that (old version).
Now it has the old version in appCache_1, but thinks it is up-to-date.
Next you visit the site associated with appCache_2, it notices manifest_2 is invalidated, checks files … and, you guessed it, when it gets to shared.prob, it takes the same old version from appCacche_1 🙁 🙁
They seem to keep playing catch like that for ever!
The only way out is to navigate to the pseudo-url ‘chrome://appcache-internals/’, which lets you remove caches entirely … wonderful.
But don’t know if there is an equivalent to this on Android browser as it certainly seems to have odd caching behaviour, but does seem to ‘sort itself out’ after a time! Other browsers seem to temporarily have problems like this, but a few forced refreshes seems to work!
For future versions I plan to use some Apache ‘Rewrite’ rules to make it look to the browser that the shared files are in fact to completely different files:
RewriteRule ^version_3/shared/(.*)$ /shared_place/$1 [L]
To be fair the cache cycle more of a problem during development rather than deployment, but still … so confusing.
Useful sites:
These are some sites I found useful for the application cache, but none sorted everything … and none mentioned Chrome’s infinite cache cycle!
- http://www.w3.org/TR/2008/WD-html5-20080122/#appcache
The W3C specification – of course this tell you how appCache is supposed to work, not necessarily what it does on actual browsers! - http://www.html5rocks.com/en/tutorials/appcache/beginner/
It is called “A Beginner’s Guide to using the Application Cache”, but is actually pretty complete. - http://appcachefacts.info
Really useful quick reference, but: “FACT: Any changes made to the manifest file will cause the browser to update the application cache.” – don’t you believe it! For some browsers (Chrome, Android) you have to add your own checks in the code (See “Updating the cache” section in “A Beginner’s Guide …”).). - http://manifest-validator.com/
Wonderful on-line manifest file validator checks both syntax and also whether all the referenced files download OK. Of course it cannot tell whether you have included all the files you need to.
Alt-HCI open reviews – please join in
Papers are online for the Alt-HCI trcak of British HCI conference in September.
These are papers that are trying in various ways to push the limits of HCI, and we would like as many people as possible to join in discussion around them … and this discussion will be part of process for deciding which papers are presented at the conference, and possibly how long we give them!
Here are the papers — please visit the site, comment, discuss, Tweet/Facebook about them.
paper #154 — How good is this conference? Evaluating conference reviewing and selectivity
do conference reviews get it right? is it possible to measure this?
paper #165 — Hackinars: tinkering with academic practice
doing vs talking – would you swop seminars for hack days?
paper #170 — Deriving Global Navigation from Taxonomic Lexical Relations
website design – can you find perfect words and structure for everyone?
paper #181 — User Experience Study of Multiple Photo Streams Visualization
lots of photos, devices, people – how to see them all?
paper #186 — You Only Live Twice or The Years We Wasted Caring about Shoulder-Surfing
are people peeking at your passwords? what’s the real security problem?
paper #191 — Constructing the Cool Wall: A Tool to Explore Teen Meanings of Cool
do you want to make thing teens think cool? find out how!
paper #201 — A computer for the mature: what might it look like, and can we get there from here?
over 50s have 80% of wealth, do you design well for them?
paper #222 — Remediation of the wearable space at the intersection of wearable technologies and interactive architecture
wearable technology meets interactive architecture
paper #223 — Designing Blended Spaces
where real and digital worlds collide
open data: for all or the few?
On Twitter Jeni Tennison asked:
Question: aside from personally identifiable data, is there any data that *should not* be open? @JenT 11:19 AM – 14 Jul 12
This sparked a Twitter discussion about limits to openness: exposure of undercover agents, information about critical services that could be exploited by terrorists, etc. My own answer was:
maybe all data should be open when all have equal ability to use it & those who can (e.g. Google) make *all* processed data open too @alanjohndix 11:34 AM – 14 Jul 12
That is, it is not clear that just because data is open to all, it can be used equally by everyone. In particular it will tend to be the powerful (governments and global companies) who have the computational facilities and expertise to exploit openly available data.
In India statistics about the use of their own open government data1 showed that the majority of access to the data was by well-off males over the age of 50 (oops that may include me!) – hardly a cross section of society. At a global scale Google makes extensive use of open data (and in some cases such as orphaned works or screen-scraped sites seeks to make non-open works open), but, quite understandably for a profit-making company, Google regards the amalgamated resources as commercially sensitive, definitely not open.
Open data has great potential to empower communities and individuals and serve to strengthen democracy2. However, we need to ensure that this potential is realised, to develop the tools and education that truly make this resource available to all3. If not then open data, like unregulated open markets, will simply serve to strengthen the powerful and dis-empower the weak.
- I had a reference to this at one point, but can’t locate it, does anyone else have the source for this.[back]
- For example, see my post last year “Private schools and open data” about the way Rob Cowen @bobbiecowman used UK government data to refute the government’s own education claims.[back]
- In fact there are a variety of projects and activities that work in this area: hackathons, data analysis and visualisation websites such as IBM Many Eyes, data journalism such as Guardian Datablog and some government and international agencies go beyond simply publishing data and offer tools to help users interpret it (I recall Enrico Bertini, worked on this with one of the UN bodies some years go). Indeed there will be some interesting data for mashing at the next Tiree Tech Wave in the autumn.[back]
not forgotten! 1997 scrollbars paper – best tech writing of the week at The Verge
Thanks to Marcin Wichary for letting me know that my 1997/1998 Interfaces article “Hands across the Screen” was just named in “Best Tech Writing of the Week” at The Verge. Some years ago Marcin reprinted the article in his GUIdebook: Graphical User Interface gallery, and The Verge picked it up from there.
Hands across the screen is about why we have scroll bars on the right-hand side, even though it makes more sense to have them on the left, close to our visual attention for text. The answer, I suggested, was that we mentally ‘imagine’ our hand crossing the screen, so a left-hand scroll-bar seems ‘wrong’, even though it is better (more on this later).
Any appreciation is obviously gratifying, but this is particularly so because it is a 15 year old article being picked up as ‘breaking’ technology news.
Interestingly, but perhaps not inconsequentially, the article was itself both addressing an issue current in 1997 and also looking back more than 15 years to the design of the Xerox Star and other early Xerox GUI in the late 1970s early 1980s as well as work at York in the mid 1980s.
Of course this should always be the case in academic writing: if the horizon is (only) 3-5 years leave it to industry. Academic research certainly can be relevant today (and the article in question was in 1997), but if it does not have the likelihood of being useful in 10–20 years, then it is not research.
At the turn of the Millenium I wrote in my regular HCI Education column for SIGCHI Bulletin:
“Pick up a recent CHI conference proceedings and turn to a paper at random. Now look at its bibliography – how many references are there to papers or books written before 1990 (or even before 1995)? Where there are older references, look where they come from — you’ll probably find most are in other disciplines: experimental psychology, physiology, education. If our research papers find no value in the HCI literature more than 5 years ago, then what value has today’s HCI got in 5 years time? Without enduring principles we ought to be teaching vocational training courses not academic college degrees.”
(“the past, the future, and the wisdom of fools“, SIGCHI Bulletin, April 2000)
At the time about 90% of CHI citations were either to work in the last 5 years, or to the authors’ own work, to me that indicated a discipline in trouble — I wonder if it is any better today?
When revising the HCI textbook I am always pleased at the things that do not need revising — indeed some parts have hardly needed revising since the first edition in 1992. These parts seem particularly important in education – if something has remained valuable for 10, 15, 20 years, then it is likely to still be valuable to your students in a further 10, 15, 20 years. Likewise the things that are out of date after 5 years, even when revised, are also likely to be useless to your students even before they have graduated.
In fact, I have always been pleased with Hands across the Screen, even though it was short, and not published in a major conference or journal. It had its roots in an experiment in my first every academic job at York in the mid-1980s, when we struggled to understand why the ‘obvious’ position for scroll arrows (bottom right) turned out to not work well. After a detailed analysis, we worked out that in fact the top-left was the best place (with some other manipulations), and this analysis was verified in use.
As an important meta-lesson what looked right turned out not to be right. User studies showed that it was wrong, but not how to put it right, and it was detailed analysis that filled the vital design gap. However, even when we knew what was right it still looked wrong. It was only years later (in 1997) that I realised that the discrepancy was because one mentally imagined a hand reaching across the screen, even though really one was using a mouse on the desk surface.
Visual (and other) impressions of designers and users can be wrong; as in any mature field, quite formal, detailed analysis is necessary to compliment even the most experienced designer’s intuitions.
The original interfaces article was followed by an even shorter subsidiary article “Sinister Scrollbar in the Xerox Star Xplained“, that delved into the history of the direction of scroll arrows on a scrollbar, and how they arose partly from a mistake when Apple took over the Star designs! This is particularly interesting today given Apple’s perverse decision to remove scroll arrows completely — scrolling now feels like a Monti Carlo exercise, hoping you end up in the right place!
However, while it is important to find underlying principles, theories and explanations that stand the test of time, the application of these will certainly change. Whilst, for an old mouse + screen PC, the visual ‘hands across the screen’ impression was ‘wrong’ in terms of real use experience, now touch devices such as the iPad have changed this. It really is a good idea to have the scrollbar on the left right so that you don’t cover up the screen as you scroll. Or to be precise it is good if you are right handed. But look hard, there are never options to change this for left-handed users — is this not a discrimination issue? To be fair tabs and menu items are normally found at the top of the screen equally bad for all. As with the scroll arrows, it seems that Apple long ago gave up any pretense of caring for basic usability of ergonomics (one day those class actions will come from a crippled generation!) — if people buy because of visual and tactile design, why bother? And where Apple lead the rest of the market follows 🙁
Actually it is not as easy as simply moving buttons around the screen; we have expectations from large screen GUI interfaces that we bring to the small screen, so any non-standard positioning needs to be particularly clear graphically. However, the diverse location of items on web pages and often bespoke design of mobile apps, whilst bringing their own problems of inconsistency, do give a little more flexibility.
So today, as you design, do think “hands”, and left hands as well as right hands!
And in 15 years time, who knows what we’ll have in our hands, but let’s see if the same deep principles still remain.
spice up boring lists of web links – add favicons using jQuery
Earlier today I was laying out lists of links to web resources, initially as simple links:
However, this looked a little boring and so thought it would be good to add each site’s favicon (the little icon it shows to the left on a web browser), and have a list like this:
The pages with the lists were being generated, and the icons could have been inserted using a server-side script, but to simplify the server-side code (for speed and maintainability) I put the fetching of favicons into a small JavaScript function using jQuery. The page is initially written (or generated) with default images, and the script simply fills in the favicons when the page is loaded.
The list above is made by hand, but look at this example page to see the script in action.
You can use this in your own web pages and applications by simply including a few JavaScript files and adding classes to certain HTML elements.
See the favicon code page for a more detailed explanation of how it works and how to use it in your own pages.