Friendly URLs with Perch Blog

URL re-writing is tricky. Before I began editing my .htaccess file I researched what on earth I should be doing. Even after this I still found myself having what-the-fuck moments. What I have written here is more a series of notes than a complete guide. But if it is confusing do check out the references as they helped me.

What are we trying to achieve?

I’ll take it for granted that we all know why we want to rewrite URLs, and that we have some idea of this is achieved. If not I would recommend reading Drew McLellan’s URL Rewriting for the Fearful.

My setup

This website uses Perch for the CMS and the Blog app. The php files for the blog app are in a subfolder blog which contains:

  • archive.php (list of posts)
  • post.php (the post itself), and
  • rss.php (the RSS feed)

Every other php file is in the root folder for the website. I also have some base rewrite rules:

  1. To never show www. in front of the domain:

    RewriteCond %{HTTPS} !=on
    RewriteCond %{HTTP_HOST} ^www\.(.+)$ [NC]
    RewriteRule ^ http://%1%{REQUEST_URI} [R=301,L]
  2. To hide the file extension:

    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteCond %{REQUEST_FILENAME}.php -f 
    RewriteRule ^(.+)$ $1.php [L,QSA]
  3. And Drew’s suggestion of preventing any rewrites on the perch folder with:

    RewriteRule ^perch - [L]

My aim was to end up with domain/articles/title-of-post. My first stab at the RewriteRules looked like this:

RewriteRule ^articles/?$ /articles/archive.php? [NC,L]
RewriteRule ^articles/([a-zA-Z0-9-]+)$ /articles/post.php?s=$1 
RewriteRule ^articles/([a-zA-Z0-9-]+)/preview$ /articles/post.php?s=$1&preview=all [NC,L]

Here you can see the first rule is to have the URL domain/articles point to the archive.php file.

The /? is an optional forward slash so that both domain/articles and domain/articles/ would both work.

The second rule was to have the URL domain/articles/article-title point to the post.php file. Perch generates the URL for each post there are two settings that need to be amended in the admin:

  • Blog post page path set to /articles/{postSlug}
  • Slug format set to {postTitle}

The default slug format in Perch includes date information. But I had no intention of filtering by date. The Blog post page path is the same things as postURL that you might use in a template.

These rules worked: kind of. If you typed domain/articles you got a list of articles. If you typed domain/articles/article-title you got the article itself. But there were problems…

Problem number one: no content

If you typed domain/articles/any-old-text you ended up being shown post.php with no content.

Now it took me a while to figure out that the cause of this was not the rewrite rule. The reason was any-old-text did not exist so no content could be rendered.

The solution was to add <perch:no results> into the post.html template. Here you can write a fallback should there be no article.

Problem two: changing my mind

I decided that I no longer wanted a single blog, but two! At the time, Perch did not have a feature for multiple blogs, but it did have sections.

Stupidly I decided to duplicate my articles folder - with archive.php, post.php and rss.php - and rename to projects.

I soon realised this was overkill. Really there was only one blog. All posts are saved in the same place.

What I needed to do was rewrite the URL, so that either articles/article-title or projects/project-title would point to the same file.

So – stupidly? – I duplicated my Rewrite rules instead:

RewriteRule ^articles/?$ /articles/archive.php? [NC,L]
RewriteRule ^articles/([a-zA-Z0-9-]+)$ /articles/post.php?s=$1 
RewriteRule ^articles/([a-zA-Z0-9-]+)/preview$ /articles/post.php?s=$1&preview=all [NC,L]
RewriteRule ^projects/?$ /projects/index.php? [NC,L]
RewriteRule ^projects/([a-zA-Z0-9-]+)$ /projects/post.php?s=$1
RewriteRule ^projects/([a-zA-Z0-9-]+)/preview$ /projects/post.php?s=$1&preview=all [NC,L]

Again while the rule technically worked, there were two problems:

  1. Archive.php would display all posts for both sections
  2. You could swap /articles/same-post-title for /projects/same-post-title and still view the post

What needed to happen was to filter archive.php and post.php by the section. To do that the php in those pages needed to get which section to use:

 <?php
         perch_blog_custom(array(
                ‘template’   => ‘post_in_list.html’,
                ‘count’      => 20,
             ‘section’    => ‘perch_get(‘section’)’,
             ‘sort’       => ‘postDateTime’,
             ‘sort-order’ => ‘DESC’,
         ));
 ?>

I also changed the Blog post page path to /{sectionSlug}/{postSlug} so that the section was always part of the URL.

On the rewrite rules I first of all wrote two rules:

RewriteRule ^articles/?$ /articles/archive.php?section=articles [NC,L]
RewriteRule ^projects/?$ /articles/archive.php?section=projects [NC,L]

But I kept seeing domain/articles/?section=articles in the address bar. All the content was ok, but what’s going on with that URL?

I had stumbled across what is a common pitfall: my rewrite rule matched an actual folder.

The solution was to rename the folder to blog.

Problem three: trying to be clever

I also knew enough about writing rewrite rules that I could capture text on the left and reuse it on the right.

So I copied the mechanism from the post.php rule and ended up with:

RewriteRule ^([a-zA-Z0-9-]+)/?$ /blog/archive.php?section=$1 

The problem with this is that the rule is too generic. Any URL now points to archive.php including actual folders like /perch.

I thought, ok, I only really want this rule to work when the words articles and projects are used. So after bit more research into writing regular expressions I came up with:

RewriteRule ^(articles|projects)/?$ /blog/archive.php?section=$1 

The vertical bar | is an and/or statement. So the rule only works if articles or projects is typed. Then the rule passes on whichever it is into the actual URL to be used by the php script.

I applied this to the post.php file, capturing both section and the post title:

RewriteRule ^(articles|projects)/([a-zA-Z0-9-]+)$ /blog/post.php?s=$2&section=$1

I did try capturing only the title and not passing the section at all, but I could not get articles to display content.

Projects would always work fine, but articles would constantly show no results.

A side note about URL parameters

I wasn’t sure if ?section was some kind of keyword. Quite often in the Perch documentation you see ?s= or ?slug= being used.

As far as I could make out ?section was not a keyword. You could write ?pandasocks=whatever-text and it will work.

But the trick is that on the php page you use the same word. So in my archive.php file I used perch_get(‘section’) and not perch_get(’s’).

Alls well that ends in [L]

I cannot say whether my rules are robust, or whether there are any performance issues, but they do work.

In one last bid to make the rules more friendly I thought, “what if someone types in the singular article instead of the plural articles?”

My rules shouldn’t penalise someone for thinking like this. After all domain/article/article-title could be considered more correct than domain/articles/article-title.

So the final set of rules are:

RewriteRule ^(articles|projects)/?$ /blog/archive.php?section=$1 [NC,L]
RewriteRule ^(article|project)/?$ /blog/archive.php?section=$1s [NC,L]
RewriteRule ^(articles|projects)/([a-zA-Z0-9-]+)$ /blog/post.php?s=$2&section=$1  [NC,L]
RewriteRule ^(article|project)/([a-zA-Z0-9-]+)$ /blog/post.php?s=$2&section=$1s  [NC,L]
RewriteRule ^(articles|projects)/([a-zA-Z0-9-]+)/preview$ /blog/post.php?s=$2&section=$1&preview=all  [NC,L]
RewriteRule ^(article|project)/([a-zA-Z0-9-]+)/preview$ /blog/post.php?s=$2&section=$1s&preview=all  [NC,L]