Alexander Blackman

Nice URLs with Perch

URL re-writing is tricky. Before I began editing my .htaccess file I researched what on earth I should be doing. Even after this but I still found myself having what-the-fuck moments.

I’ve written to hopefully help others avoid the mistakes I’ve made. I am writing from a position of hindsight, but I will very much try to show where I got stuck and what I did wrong.

What are we trying to achieve?

I’ll take it for granted that we all know why we want to rewrite URLs, and that we have some idea of this is achieved. If not I would recommend reading Drew McLellan’s URL Rewriting for the Fearful.

My setup

My website uses Perch for the CMS and I use the Blog app The php files for the blog app are in a subfolder which contains the archive.php (list of posts), post.php (the post itself) and rss.php (the RSS feed) files.

Every other php file is just in the root folder of the site. The archive.php were not the default files as I had already stripped out the code for tags and categories as I had no plan to use them.

My first stab at the RewriteRules looked like this:

RewriteRule ^articles/?$ /articles/archive.php? [NC,L]
RewriteRule ^articles/([a-zA-Z0-9-]+)$ /articles/post.php?s=$1 
RewriteRule ^articles/([a-zA-Z0-9-]+)/preview$ /articles/post.php?s=$1&preview=all [NC,L]

Here you can I see that the first rule was to have the URL domain/articles point to the archive.php file. The /? is an optional forward slash so that both domain/articles and domain/articles/ would both work the same.

The second rule was to have the URL domain/articles/article-title point to the post.php files. As Perch is generating the URL for each post there are two settings that need to be amended in the admin:

  • Blog post page path as /articles/{postSlug}
  • Slug format as {postTitle}

The default slug format for Perch includes date information, but I had no intention of filtering by date. The Blog post page path is the same things as postURL that you might use in a template.

These rules worked: If you typed in domain/articles you got a list of articles; If you typed in the domain/articles/article-title you got the article itself. Also if you typed in domain/any-old-text you would get a 404, but if you typed domain/perch (an actual subfolder) you could login without an issues.

Problem number one: no content

Well, the first thing I found was that if you typed in domain/articles/any-old-text you ended up being shown post.php with no content. Now it took me a while to figure out that the cause of this was not the rewrite rule.

The reason was any-old-text did not exist and so no content could be rendered. The solution to this was to add a <perch:no results> into the post.html template effectively to provide a fall back should there be no article.

Fickle people

I was happy with these rules; all was working well. Then I decided that I no longer wanted a single blog, but two! At the time, Perch did not have a feature for multiple blogs (although Perch Runway does), but it does have [sections].

Stupidly I initially duplicated my articles folder (with archive.php, post.php and rss.php) and renamed the copy to projects. I soon realised this was overkill as really there was only one blog; all posts are saved in the same place.

What I needed to do was rewrite the URL, so that either articles/article-title or projects/project-title would point to the same file. So I duplicated my rewrite rules instead:

RewriteRule ^articles/?$ /articles/archive.php? [NC,L]
RewriteRule ^articles/([a-zA-Z0-9-]+)$ /articles/post.php?s=$1 
RewriteRule ^articles/([a-zA-Z0-9-]+)/preview$ /articles/post.php?s=$1&preview=all [NC,L]
RewriteRule ^projects/?$ /projects/index.php? [NC,L]
RewriteRule ^projects/([a-zA-Z0-9-]+)$ /projects/post.php?s=$1
RewriteRule ^projects/([a-zA-Z0-9-]+)/preview$ /projects/post.php?s=$1&preview=all [NC,L]

Problem number two: mixed content

Again while the rule technically worked, there were two problems:

  1. Archive.php would display all posts for both sections
  2. You could swap /articles/same-post-title for /projects/same-post-title and still view the post

What needed to happen was to filter archive.php and post.php by the section. To do that the php on those pages needed to get which section to use:

            ‘template’   => ‘post_in_list.html’,
            ‘count’      => 20,
            ‘section’        => ‘perch_get(‘section’)’,
            ‘sort’       => ‘postDateTime’,
            ‘sort-order’ => ‘DESC’,

I also changed the Blog post page path to /{sectionSlug}/{postSlug} so that the section was always part of the URL.On the rewrite rules I first of all wrote two rules:

RewriteRule ^articles/?$ /articles/archive.php?section=articles [NC,L]
RewriteRule ^projects/?$ /articles/archive.php?section=projects [NC,L]

You can see this is excessive and I also ran into some weird behaviour. I kept seeing domain/articles/?section=articles in the address bar. All the content was ok, but what’s going on with that URL? I spent some time trying to figure this out, but I stumbled across a common pitfall that is my rewrite rule was matching an actual folder.

The solution was to simply rename the folder to blog. I also knew enough that I could capture text on the left and reuse it on the right. So I copied the mechanism from the post.php rule and ended up with:

RewriteRule ^([a-zA-Z0-9-]+)/?$ /blog/archive.php?section=$1 

The problem now is that any URL now pointed to the blog archive.php including actual folders like /perch. So I thought, ok, I only really want this rule to work when the words articles and projects are used.

I did a bit more research in regular expressions and found:

RewriteRule ^(articles|projects)/?$ /blog/archive.php?section=$1 

What the | does is turn it into an or statement. So the rule no works only if it is article or projects, then passes on whichever it is into the actual URL to be used by the php script.

I also looked into the query ?section and whether section was some kind of keyword. Quite often you in Perch documentation you see ?s= or ?slug= being used.

The truth is that these are not keyword and can in fact be any text at all. You could write ?pandasocks=whatever-text and it will work. The trick with Perch is that on the php page you resume the same word. So in my archive.php file I used perch_get(‘section’) and not perch_get(’s’).

Applying the same logic

To “fix” articles being mixed with sections I decided to also filter the post.php by section as well. As with the archive.php rule I captured the section and added it to the URL:

RewriteRule ^(articles|projects)/([a-zA-Z0-9-]+)$ /blog/post.php?s=$2&section=$1

I did try just capturing only the title and not passing the section at all, but I could not get articles to display consistently. For some reason projects would always work fine, but articles would constantly show no results.

Alls well that ends in [L]

With all the little niggles ironed out I was relieved that interacting with my site was behaving how I thought it should.

I cannot honestly say whether my rules are particularly robust, or whether there are performance issues, but they do work. In one last bid to make the rules more human friendly I thought, what about if someone types in the singular article instead of the plural articles?

My rules shouldn’t really penalise someone for thinking like this. After all domain/article/article-title could be considered more correct than domain/articles/article-title. My final set of rules are as follows:

RewriteRule ^(articles|projects)/?$ /blog/archive.php?section=$1 [NC,L]
RewriteRule ^(article|project)/?$ /blog/archive.php?section=$1s [NC,L]

RewriteRule ^(articles|projects)/([a-zA-Z0-9-]+)$ /blog/post.php?s=$2&section=$1  [NC,L]
RewriteRule ^(article|project)/([a-zA-Z0-9-]+)$ /blog/post.php?s=$2&section=$1s  [NC,L]

RewriteRule ^(articles|projects)/([a-zA-Z0-9-]+)/preview$ /blog/post.php?s=$2&section=$1&preview=all  [NC,L]
RewriteRule ^(article|project)/([a-zA-Z0-9-]+)/preview$ /blog/post.php?s=$2&section=$1s&preview=all  [NC,L]

Want to learn more?

There are many sites and articles that cover rewrite rules but the ones below I found to be most useful.

15 December 2015

Back to start Previous: Next: