using the Public Suffix list

On a number of occasions I have wanted to decompose domain names, for example in the URL recogniser in Snip!t.  However, one problem has always been the bit at the end.  It is clear that ‘com’ and ‘ac.uk’ are the principle suffixes of ‘www.alandix.com’ and ‘www.cs.bham.ac.uk’ respectively.  However, while I know that for UK domains it is the last two components that are important (second level domains), I never knew how to work this out in general for other countries.  Happily, Mozilla and other browser vendors have an initiative called the Public Suffix List , which provides a list of just these important critical second level (and deeper level) suffixes.

I recently found I needed this again as part of my Talis research.  There is a Ruby library and a Java sourceforge project for reading the Public Suffix list, and an implementation by the DKIM Reputation project, that transforms the list into generated tables for C, PHP and Perl.  However, nothing for easily and automatically maintaining access to the list.  So I have written a small PHP class to parse, store and access the Public Suffix list. There is an example in the public suffix section of the ‘code’ pages in this blog, and it also has its own microsite including more examples, documentation and a live demo to try.

Apache: pretty URLs and rewrite loops

[another techie post – a problem I had and can see that other people have had too]

It is common in various web frameworks to pass pretty much everything through a central script using Apache .htaccess file and mod_rewrite.  For example enabling permalinks in a WordPress blog generates an .htaccess file like this:

RewriteEngine On
RewriteBase /blog/
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /blog/index.php [L]

I use similar patterns for various sites such as vfridge (see recent post “Phoenix rises“) and Snip!t.  For Snip!t however I was using not a local .htaccess file, but an AliasMatch in httpd.conf, which meant I needed to ask Fiona every time I needed to do a change (as I can never remember the root passwords!).  It seemed easier (even if slightly less efficient) to move this to a local .htaccess file:

RewriteEngine On
RewriteBase /
RewriteRule ^(.*)$ code/top.php/$1 [L]

The intention is to map “/an/example?args” into “/code/top.php/an/example?args”.

Unfortunately this resulted in a “500 internal server error” page and in the Apache error log messages saying there were too many internal redirects.  This seems to be a common problem reported in forums (see here, here and here).  The reason for this is that .htaccess files are encountered very late in Apache’s processing and so anything rewritten by the rules gets thrown back into Apache’s processing almost as if they were a fresh request.  While the “[L]”(last)  flags says “don’t execute any more rules”, this means “no more rules on this pass”, but when Apache gets back to the .htaccess in the fresh round the rule gets encountered again and again leading to an infinite loop “/code/top/php/code/top.php/…/code/top.php/an/example?args”.

Happily, mod_rewrite thought of this and there is an additional “[NS]” (nosubreq) flag that says “only use this rule on the first pass”.  The mod_rewrite documentation for RewriteRule in Apache 1.3, 2.0 and 2.3 says:

Use the following rule for your decision: whenever you prefix some URLs with CGI-scripts to force them to be processed by the CGI-script, the chance is high that you will run into problems (or even overhead) on sub-requests. In these cases, use this flag.

I duly added the flag:

RewriteRule ^(.*)$ code/top.php/$1 [L,NS]

This should work, but doesn’t.  I’m not sure why except that the Apache 2.2 documentation for NS|nosubreq reads:

NS|nosubreq

Use of the [NS] flag prevents the rule from being used on subrequests. For example, a page which is included using an SSI (Server Side Include) is a subrequest, and you may want to avoid rewrites happening on those subrequests.

Images, javascript files, or css files, loaded as part of an HTML page, are not subrequests – the browser requests them as separate HTTP requests.

This is identical to the documentation for 1.3, 2.0 and 2.3 except that quote about “URLs with CGI-scripts” is singularly missing.  I can’t find anything that says so, but my guess is that there was some bug (feature?) introduced 2.2 that is being fixed in 2.3.

WordPress is immune from the infinite loop as the directive “RewriteCond %{REQUEST_FILENAME} !-f” says “if the file exists use that without rewriting”.  As “index.php” is a file, the rule does not rewrite a second time.  However, the layout of my files meant that I sometimes have an actual file in the pseudo location (e.g. /an/example really exists).  I could have reorganised the complete directory structure … but then I would have been still fixing all the broken links now!

Instead I simply added an explicit “please don’t rewrite my top.php script” condition:

RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_URI}  !^/code/top.php/.*
RewriteRule ^(.*)$ code/top.php/$1 [L,NS]

I suspect that this will be unnecessary when Apache upgrades to 2.3, but for now … it works 🙂

fix for toString error in PHPUnit

I was struggling to get PHPUnit to run under PHP 5.2.9. I’ve only used PHPUnit a little, so may have simply got something wrong, but I kept getting the error:

Catchable fatal error: Object of class AbcTest could not be converted to string in {dir}/PHPUnit/Framework/TestFailure.php on line 98

The error happens in the PHPUnit_Framework_TestFailure::toString method, which tries to implicitly convert a test case to a string.

The class AbcTest is my test case, which it is trying to display following a test failure.  PHPUnit test cases all extend PHPUnit_Framework_TestCase and while this has a toString method it does not have the ‘magic method__toString required by PHP 5.2 onwards.

To fix the problem I simply added the following method to the class PHPUnit_Framework_TestCase in PHPUnit/Framework/TestCase.php .

public function __toString()
 {
 return $this->toString();
 }

I am using PHPUnit 3.4.9, but peeking at 3.5.0beta it looks the same.  I’m guessing the PHPUnit_Framework_TestFailure::toString method is not used much so has got missed since the change to PHP 5.2.x.

PHPUnit is now in GITHub so I really ought to work out how to submit corrections to that … but another day I think.

Phoenix rises – vfridge online again

vfridge is back!

I mentioned ‘Project Phoenix’ in my last previous post, and this was it – getting vfridge up and running again.

Ten years ago I was part of a dot.com company aQtive1 with Russell Beale, Andy Wood and others.  Just before it folded in the aftermath of the dot.com crash, aQtive spawned a small spin-off vfridge.com.  The virtual fridge was a social networking web site before the term existed, and while vfridge the company went the way of most dot.coms, for some time after I kept the vfridge web site running on Fiona’s servers until it gradually ‘decayed’ partly due to Javascript/DOM changes and partly due to Java’s interactions with mysql becoming unstable (note very, very old Java code!).  But it is now back online 🙂

The core idea of vfridge is placing small notes, photos and ‘magnets’ in a shareable web area that can be moved around and arranged like you might with notes held by magnets to a fridge door.

Underlying vfridge was what we called the websharer vision, which looked towards a web of user-generated content.  Now this is passé, but at the time  was directly counter to accepted wisdom and looking back seem prescient – remember this was written in 1999:

Although everyone isn’t a web developer, it is likely that soon everyone will become an Internet communicator — email, PC-voice-comms, bulletin boards, etc. For some this will be via a PC, for others using a web-phone, set-top box or Internet-enabled games console.

The web/Internet is not just a medium for publishing, but a potential shared place.

Everyone may be a web sharer — not a publisher of formal public ‘content’, but personal or semi-private sharing of informal ‘bits and pieces’ with family, friends, local community and virtual communities such as fan clubs.

This is not just a future for the cognoscenti, but for anyone who chats in the pub or wants to show granny in Scunthorpe the baby’s first photos.

Just over a year ago I thought it would be good to write a retrospective about vfridge in the light of the social networking revolution.  We did a poster “Designing a virtual fridge” about vfridge years ago at a Computers and Fun workshop, but have never written at length abut its design and development.  In particular it would be good to analyse the reasons, technical, social and commercial, why it did not ‘take off’ the time.  However, it is hard to do write about it without good screen shots, and could I find any? (Although now I have)  So I thought it would be good to revive it and now you can try it out again. I started with a few days effort last year at Christmas and Easter time (leisure activity), but now over the last week have at last used the fact that I have half my time unpaid and so free for my own activities … and it is done 🙂

The original vfridge was implemented using Java Servlets, but I have rebuilt it in PHP.  While the original development took over a year (starting down in Coornwall while on holiday watching the solar eclipse), this re-build took about 10 days effort, although of course with no design decisions needed.  The reason it took so much development back then is one of the things I want to consider when I write the retrospective.

As far as possible the actual behaviour and design is exactly as it was back in 2000 … and yes it does feel clunky, with lots of refreshing (remember no AJAX or web2.0 in those days) and of course loads of frames!  In fact there is a little cleverness that allowed some client-end processing pre-AJAX2.    Also the new implementation uses the same templates as the original one, although the expansion engine had to be rewritten in PHP.  In fact this template engine was one of our most re-used bits of Java code, although now of course many alternatives.  Maybe I will return to a discussion of that in another post.

I have even resurrected the old mobile interface.  Yes there were WAP phones even in 2000, albeit with tiny green and black screens.  I still recall the excitement I felt the first time I entered a note on the phone and saw it appear on a web page 🙂  However, this was one place I had to extensively edit the page templates as nothing seems to process WML anymore, so the WML had to be converted to plain-text-ish HTML, as close as possible to those old phones!  Looks rather odd on the iPhone :-/

So, if you were one of those who had an account back in 2000 (Panos Markopoulos used it to share his baby photos 🙂 ), then everything is still there just as you left it!

If not, then you can register now and play.

  1. The old aQtive website is still viewable at aqtive.org, but don’t try to install onCue, it was developed in the days of Windows NT.[back]
  2. One trick used the fact that you can get Javascript to pre-load images.  When the front-end Javascript code wanted to send information back to the server it preloaded an image URL that was really just to activate a back-end script.  The frames  used a change-propagation system, so that only those frames that were dependent on particular user actions were refreshed.  All of this is preserved in the current system, peek at the Javascript on the pages.    Maybe I’ll write about the details of these another time.[back]

PHP syntax checker updated

Took a quick break today from Project Phoenix1.

I’ve had a PHP syntax checker on meandeviation for several years, but only checking PHP 4 as that is what is running on the server.  However, I had an email asking about PHP 5 , so now there is a PHP 5 version too 🙂

The syntax checker is a pretty simple layer over the PHP command line option “php -l” and also uses the PHP highlight_file function.  The main complication is parsing the HTML outputs of both as they change between versions of PHP!

There is also a download archive so you can also have it running locally on your own system.

  1. watch this space …[back]

fix for WordPress shortcode bug

I’m starting to use shortcodes heavily in WordPress1 as we are using it internally on the DEPtH project to coordinate our new TouchIT book.  There was minor bug which meant that HTML tags came out unbalanced (e.g. “<p></div></p”).

I’ve just been fixing it and posting a patch2, interestingly the bug was partly due to the fact that back-references in regular expressions count from the beginning of the regular expression, making it impossible to use them if the expression may be ‘glued’ into a larger one … lack of referential transparency!

For anyone having similar problems, full details and patch below (all WP and PHP techie stuff).

Continue reading

  1. see section “using dynamic binding” in What’s wrong with dynamic binding?[back]
  2. TRAC ticket #10490[back]

What’s wrong with dynamic binding?

Dynamic scoping/binding of variables has a bad name, rather like GOTO and other remnants of the Bad Old Days before Structured Programming saved us all1.  But there are times when dynamic binding is useful and looking around it is very common in web scripting languages, event propagation, meta-level programming, and document styles.

So is it really so bad?

Continue reading

  1. Strangely also the days when major advances in substance seemed to be more important than minor advances in nomenclature[back]

a simple PHP record sorter class

Not for the first time I needed to sort arrays of arrays in PHP (structures like tiny DB tables).  I have previously written little wrapper functions round usort, but decided this time to make a small class. It is a simple, but generic utility, so popping it up in case useful to anyone.

The rest of this post has moved to a permanent page at:

http://www.alandix.com/blog/code/sorter/