fix for WordPress shortcode bug

I’m starting to use shortcodes heavily in WordPress1 as we are using it internally on the DEPtH project to coordinate our new TouchIT book.  There was minor bug which meant that HTML tags came out unbalanced (e.g. “<p></div></p”).

I’ve just been fixing it and posting a patch2, interestingly the bug was partly due to the fact that back-references in regular expressions count from the beginning of the regular expression, making it impossible to use them if the expression may be ‘glued’ into a larger one … lack of referential transparency!

For anyone having similar problems, full details and patch below (all WP and PHP techie stuff).

The regex returned by get_shortcode_regex() in shortcodes.php had a back-reference ‘[/2]’ to match balanced tag-end-tag pairs such as “[x]text[/x]”.  The regex assumes it will be used ‘bare’ as a regular expression.  However, wpautop in formatting.php (which turns newlines in posts into <p>s or <br>s) needs to remove <p>s form standalone shortcodes. To do this it adds extra enclosing brackets to the shortcode regex to be able to refer to the entire matched tags and content.  This means that the back-reference instead matches the (optional) preceding letter which is usually blank and thus the closing tag does not match properly.

In the case of single tag-end-tag pair, this means that

 [divtag]
 abc
 [/divtag]

ends up as

 <div>
 <p>abc</p>
 <p></div></p>

This gets more complicated when there are internal tags such as:

 [mytag] some [specialsymbol] text [/mytag]

As in some cases the existing .*? for attribute matching chomps everything while it tries to find a ‘/’.

A simple fix would be to add an optional parameter to get_shortcode_regex($nested=0) and use this to modify the regular expression.  This would give the intended match for the shortcode regex, but in fact makes things worse. Looking at the same source:

 [divtag]
 abc
 [/divtag]

it would generate (assuming [divtag] generates a div):

 <div></p>
 <p>abc</p>
 <p></div>

The attached patch addresses this by adding two new functions to shortcodes.php returning regular expressions for matching begin and end tags separately.  wpautop then has two separate preg_replace lines doing the <p> fixes.  I’ve tried to change as little as possible, although sometime might return to do things like extend it to allow nested copies of the same tag.

At the same time as doing the abve bug fix,  I swopped the .*? for attribute matching to [^]]*?
I guess slightly slower, but doesn’t risk chomping the entire post as a shortcode attribute :-/

Download: shortcodes patch
Instructions for installing patches at Mark Jacquith‘s entry on My WordPress Toolbox.

The patch in full:


Index: wp-includes/shortcodes.php
===================================================================
--- wp-includes/shortcodes.php    (revision 11744)
+++ wp-includes/shortcodes.php    (working copy)
@@ -175,10 +175,40 @@
 $tagnames = array_keys($shortcode_tags);
 $tagregexp = join( '|', array_map('preg_quote', $tagnames) );

-    return '(.?)[('.$tagregexp.')b(.*?)(?:(/))?](?:(.+?)[/2])?(.?)';
+    return '(.?)[('.$tagregexp.')b([^]]*?)(?:(/))?](?:(.+?)[/2])?(.?)';
 }

 /**
+ * Retrieve the shortcode regular expression for searching for start tag only.
+ *
+ * @uses $shortcode_tags
+ *
+ * @return string The shortcode start tag search regular expression
+ */
+function get_shortcode_start_regex() {
+    global $shortcode_tags;
+    $tagnames = array_keys($shortcode_tags);
+    $tagregexp = join( '|', array_map('preg_quote', $tagnames) );
+
+    return '[('.$tagregexp.')b([^]]*?)(?:(/))?]';
+}
+
+/**
+ * Retrieve the shortcode regular expression for searching for end tag only.
+ *
+ * @uses $shortcode_tags
+ *
+ * @return string The shortcode end tag search regular expression
+ */
+function get_shortcode_end_regex() {
+    global $shortcode_tags;
+    $tagnames = array_keys($shortcode_tags);
+    $tagregexp = join( '|', array_map('preg_quote', $tagnames) );
+
+    return '[/('.$tagregexp.')]';
+}
+
+/**
 * Regular Expression callable for do_shortcode() for calling shortcode hook.
 * @see get_shortcode_regex for details of the match array contents.
 *
Index: wp-includes/formatting.php
===================================================================
--- wp-includes/formatting.php    (revision 11744)
+++ wp-includes/formatting.php    (working copy)
@@ -170,7 +170,8 @@
 if (strpos($pee, '<pre') !== false)
 $pee = preg_replace_callback('!(<pre[^>]*>)(.*?)</pre>!is', 'clean_pre', $pee );
 $pee = preg_replace( "|n</p>$|", '</p>', $pee );
-    $pee = preg_replace('/<p>s*?(' . get_shortcode_regex() . ')s*</p>/s', '$1', $pee); // don't auto-p wrap shortcodes that stand alone
+    $pee = preg_replace('/<p>s*?(' . get_shortcode_start_regex() . ')s*</p>/s', '$1', $pee); // don't auto-p wrap shortcodes that stand alone
+    $pee = preg_replace('/<p>s*?(' . get_shortcode_end_regex() . ')s*</p>/s', '$1', $pee);   // check both start and end tags

 return $pee;
 }

  1. see section “using dynamic binding” in What’s wrong with dynamic binding?[back]
  2. TRAC ticket #10490[back]

3 thoughts on “fix for WordPress shortcode bug

  1. I just checked back on the wordpress development site. The bug had been noted before and added a link back to my bug report, but I just checked and the older bug report was still unresolved and I’ve just added a comment to it pointing out that I have a patch.

    For the time being best you can do is use my patch, but then of course you will need to reapply it on every wordpress update until it is fixed in the core 🙁

  2. Actually, the couple of shortcode bugs are, technically, mostly fixed with the various patches that were submitted. What’s missing in reality are unit tests. Nearly every patch, so far, either introduced new issues or changed the behavior of shortcodes.

Comments are closed.