Mod_rewrite Revisted

Mod_Rewrite Revisted

"The great thing about mod_rewrite is it gives you all the configurability and flexibility of Sendmail. The downside to mod_rewrite is that it gives you all the configurability and flexibility of Sendmail."

- Brian Behlendorf, Apache Group

One of the more powerful tricks of the .htaccess hacker is the ability to rewrite URLs. This enables us to do some mighty manipulations on our links; useful stuff like transforming very long URL's into short, cute URLs, transforming dynamic ?generated=page&URL's into /friendly/flat/links, redirect missing pages, preventing hot-linking, performing automatic language translation, and much, much more.

Make no mistake, mod_rewrite is complex. This isn't the subject for a quick bite-size tech-snack, probably not even a week-end crash-course, I've seen guys pull off some real cute stuff with mod_rewrite, but with kudos-hat tipped firmly towards that bastard operator from hell, Ralf S. Engelschall, author of the magic module itself, I have to admit that a great deal of it still seems so much voodoo to me.

The way that rules can work one minute and then seem not to the next, how browser and other in-between network caches interact with rules and testing rules is often baffling, maddening. When I feel the need to bend my mind completely out of shape, I mess around with mod_rewrite!

After all this, it does work, and while I'm not planning on taking that week-end crash-course any time soon, I have picked up a few wee tricks myself, messing around with webservers and web sites, this place..

The plan here is to just drop some neat stuff, examples, things that has proven useful, stuff that works on a variety of server setups; there are apache's all over my LAN, I keep coming across old .htaccess files stuffed with past rewriting experiments that either worked; and I add them to my list, or failed dismally; and I'm surprised that more often these days, I can see exactly why!

Nothing here is my own invention. Even the bits I figured out myself were already well documented, I just hadn't understood the documents, or couldn't find them. Sometimes, just looking at the same thing from a different angle can make all the difference, so perhaps this humble stab at URL Rewriting might be of some use. I'm writing it for me, of course. but I do get some credit for this..

# time to get dynamic, see..
rewriterule ^(.*)\.htm $1.php

beginning rewriting..

Whenever you use mod_rewrite (the part of apache that does all this magic), you need to do..

you only need to do this once per .htaccess file:
Options +FollowSymlinks
RewriteEngine on

Before any ReWrite rules. note: +FollowSymLinks must be enabled for any rules to work, this is a security requirement of the rewrite engine. Normally it's enabled in the root and you shouldn't have to add it, but it doesn't hurt to do so, and I'll insert it into all the examples on this page, just in case.

The next line simply switches on the rewrite engine for that folder. if this directive is in you main .htaccess file, then the ReWrite engine is theoretically enabled for your entire site, but it's wise to always add that line before you write any redirections, anywhere.

note: while some of the directives on this page may appear split onto two lines, in your .htaccess file, they must exist completely on one line. If you drag-select and copy the directives on this page, they should paste just fine into any text editor.

simple rewriting

Simply put, Apache scans all incoming URL requests, checks for matches in our .htaccess file and rewrites those matching URLs to whatever we specify. something like this..

all requests to whatever.htm will be sent to whatever.php:
Options +FollowSymlinks
RewriteEngine on
RewriteRule ^(.*)\.htm$ $1.php [nc]

Handy for anyone updating a site from static htm (you could use .html, or .htm(.*), .htm?, etc) to dynamic php pages; requests to the old pages are automatically rewritten to our new urls. no one notices a thing, visitors and search engines can access your content either way. leave the rule in; as an added bonus, this enables us to easily split php code and its included html structures into two separate files, a nice idea; makes editing and updating a breeze. The [nc] part at the end means "No Case", or "case-insensitive", but we'll get to the switches later.

Folks can link to whatever.htm or whatever.php, but they always get whatever.php in their browser, and this works even if whatever.htm doesn't exist! but I'm straying..

As it stands, it's a bit tricky; folks will still have whatever.htm in their browser address bar, and will still keep bookmarking your old .htm URL's. Search engines, too, will keep on indexing your links as .htm, some have even argued that serving up the same content from two different places could have you penalized by the search engines. This may or not bother you, but if it does, mod_rewrite can do some more magic..

this will do a "real" http redirection:
Options +FollowSymlinks
rewriteengine on
rewriterule ^(.+)\.htm$ http://corz.org/$1.php [r=301,nc]

This time we instruct mod_rewrite to send a proper HTTP "permanently moved" redirection, aka; "301". Now, instead of just redirecting on-the-fly, the user's browser is physically redirected to a new URL, and whatever.php appears in their browser's address bar, and search engines and other spidering entities will automatically update their links to the .php versions; everyone wins. and you can take your time with the updating, too.

not-so-simple rewriting

You may have noticed, the above examples use regular expression to match variables. what that simply means is.. match the part inside (.+) and use it to construct "$1" in the new URL. in other words, (.+) = $1 you could have multiple (.+) parts and for each, mod_rewrite automatically creates a matching $1, $2, $3, etc, in your target URL, something like..

a more complex rewrite rule:
Options +FollowSymlinks
RewriteEngine on
RewriteRule ^files/(.+)/(.+).zip download.php?section=$1&file=$2 [nc]

would allow you to present a link as..

http://mysite/files/games/hoopy.zip

and in the background have that translated to..

http://mysite/download.php?section=games&file=hoopy

which some script could process. you see, many search engines simply don't follow our ?generated=links, so if you create generating pages, this is useful. However, it's only the dumb search engines that can't handle these kinds of links; we have to ask ourselves.. do we really want to be listed by the dumb search engines? Google will handle a good few parameters in your URL without any problems, and the (hungry hungry) msn-bot stops at nothing to get that page, sometimes again and again and again…

I personally feel it's the search engines that should strive to keep up with modern web technologies, in other words; we shouldn't have to dumb-down for them. But that's just my opinion. Many users will prefer /files/games/hoopy.zip to /download.php?section=games&file=hoopy but I don't mind either way. As someone pointed out to me recently, presenting links as/standard/paths means you're less likely to get folks doing typos in typed URL's, so something like..

an even more complex rewrite rule:
Options +FollowSymlinks
RewriteEngine on
RewriteRule ^blog/([0-9]+)-([a-z]+) http://corz.org/blog/index.php?archive=$1-$2 [nc]

would be a neat trick, enabling anyone to access my blog archives by doing..

http://corz.org/blog/2003-nov

in their browser, and have it automagically transformed server-side into..

http://corz.org/blog/index.php?archive=2003-nov

which corzblog would understand. It's easy to see that with a little imagination, and a basic understanding of posix regular expression, you can perform some highly cool URL manipulations.

Here's the very basics of regexp (roughly from the apache mod_rewrite documentation)..

Text:
. Any single character
[chars] Character class: One of chars
[^chars] Character class: None of chars
text1|text2 Alternative: text1 or text2 (ie. "or")

Quantifiers:
? 0 or 1 of the preceding text
* 0 or N of the preceding text (hungry)
+ 1 or N of the preceding text

Grouping:
(text) Grouping of text
(either to set the borders of an alternative or
for making backreferences where the nth group can
be used on the target of a RewriteRule with $n)

Anchors:
^ Start of line anchor
$ End of line anchor

Escaping:
\char escape that particular char
(for instance to specify special characters.. ".[]()" etc.)

shortening URLs

One common use of mod_rewrite is to shorten URL's. shorter URL's are easier to remember and, of course, easier to type. an example..

beware the regular expression:
Options +FollowSymlinks
RewriteEngine On
RewriteRule ^grab(.*) /public/files/download/download.php$1

this rule would transform this user's URL..

http://mysite/grab?file=my.zip

server-side, into..

http://mysite/public/files/download/download.php?file=my.zip

which is a wee trick I use for my distro machine, among other things. everyone likes short URL's. and so will you; using this technique, you can move /public/files/download/ to anywhere else in your site, and all the old links still work fine. just alter your .htaccess file to reflect the new location. edit one line, done. nice. means even when stuff is way deep in your site you can have cool links like this.. /trueview/sample.php and this.

cooler access denied

In part one I demonstrated a drop-dead simple mechanism for denying access to particular files and folders. The trouble with this is the way our user gets a 403 "Access Denied" error, which is a bit like having a door slammed in your face. Fortunately, mod_rewrite comes to the rescue again and enables us to do less painful things. One method I often employ is to redirect the user to the parent folder..

they go "huh?.. ahhh!"
# send them up!
Options +FollowSymlinks
RewriteEngine on
RewriteRule ^(.*)$ ../ [nc]

It works great, though it can be a wee bit tricky with the URLs, and you may prefer to use a harder location, which avoids potential issues in indexed directories, where folks can get in a loop..

they go damn! Oh!
# send them exactly there!
Options +FollowSymlinks
RewriteEngine on
RewriteRule ^(.*)$ /comms/hardware/router/ [nc]

Sometimes you'll only want to deny access to most of the files in the directory, but allow access to maybe one or two files, or file types, easy..
deny with style!
# users can load only "special.zip", and the css and js files.
Options +FollowSymlinks
RewriteEngine On
rewritecond %{REQUEST_FILENAME} !^(.+)\.css$
rewritecond %{REQUEST_FILENAME} !^(.+)\.js$
rewritecond %{REQUEST_FILENAME} !special.zip$
RewriteRule ^(.+)$ /chat/ [nc]

Here we take the whole thing a stage further. Users can access .css (stylesheet) and javascript files without problem, and also the file called "special.zip", but requests for any other filetypes are immediately redirected back up to the main "/chat/" directory. You can add as many types as you need. You could also bundle the filetypes into one line using | (or) syntax, though individual lines are perhaps clearer.

prevent hot-linking

Believe it or not, there are some webmasters who, rather than coming up with their own content will steal yours. really! even worse, they won't even bother to copy to their own server to serve it up, they'll just link to your content! no, it's true, in fact, it used to be incredibly common. these days most people like to prevent this sort of thing, and .htaccess is one of the best ways to do it.

This is one of those directives where the mileage variables are at their limits, but something like this works fine for me..

how DARE they!
Options +FollowSymlinks
# no hot-linking
RewriteEngine On
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?corz\.org/ [nc]
RewriteRule .*\.(gif|jpg|png)$ http://corz.org/img/hotlink.png [nc]

You may see the last line broken into two, but it's all one line (all the directives on this page are). let's have a wee look at what it does..

We begin by enabling the rewrite engine, as always.

The first RewriteCond line allows direct requests (not from other pages - an "empty referrer") to pass unmolested. The next line means; if the browser did send a referrer header, and the word "corz.org" is not in the domain part of it, then DO rewrite this request.

The all-important final RewriteRule line instructs mod_rewrite to rewrite all matched requests (anything without "corz.org" in its referrer) asking for gifs, jpegs, or pngs, to an alternative image. mine says "no hotlinking!". You can see it in action here. There are loads of ways you can write this rule. google for "hot-link protection" and get a whole heap. simple is best. you could send a wee message instead, or direct them to some evil script, or something.

lose the "www"

I'm often asked how I prevent the "www" part showing up at my site, so I guess I should add something about that. Briefly, if someone types http://www.corz.org/ into their browser (or uses the www part for any link at corz.org) it is redirected to the plain, rather neat, http://corz.org/ version. This is very simple to achieve, like this..

beware the regular expression:
Options +FollowSymlinks
RewriteEngine on
rewritecond %{http_host} ^www\.corz\.org [nc]
rewriterule ^(.*)$ http://corz.org/$1 [r=301,nc]

You don't need to be a genius to see what's going on here. There are other ways you could write this rule, but again, simple is best. Like most of the examples here, the above is pasted directly from my own main .htaccess file, so you can be sure it works perfectly.

automatic translation

If you don't read English, or some of your guests don't, here's a neat way to have the wonderful Google translator provide automatic on-the-fly translation for your site's pages. Something like this..

they simply add their country code to the end of the link, or you do..
Options +FollowSymlinks
RewriteEngine on
RewriteRule ^(.*)-fr$ http://www.google.com/translate_c?hl=fr&sl=en&u=http://corz.org/$1 [r,nc]
RewriteRule ^(.*)-de$ http://www.google.com/translate_c?hl=de&sl=en&u=http://corz.org/$1 [r,nc]
RewriteRule ^(.*)-es$ http://www.google.com/translate_c?hl=es&sl=en&u=http://corz.org/$1 [r,nc]
RewriteRule ^(.*)-it$ http://www.google.com/translate_c?hl=it&sl=en&u=http://corz.org/$1 [r,nc]
RewriteRule ^(.*)-pt$ http://www.google.com/translate_c?hl=pt&sl=en&u=http://corz.org/$1 [r,nc]

You can create your menu with its flags or whatever you like, and add the country code to end of the links.. Only certain areas of corz.org are covered by these rules, but they work well enough (though prefers users to ad the -fr manually. hmm).. Want to see this page in french?

Although it is very handy, and I've been using it here for a couple of years here at the org, for my international blog readers, all two of them, heh. Almost no one knows about it, mainly because I don't have any links . One day I'll probably do a wee toolbar with flags and what-not. Perhaps not. Trouble is, the Google translator stops translating after a certain amount of characters, though these same rules could easily be applied to other translators. And if you find a good one, one that will translate a really huge document on-the-fly, do let me know!

If you wanted to be really clever, you could even perform some some kind of IP block check and present the correct version automatically, but that is outside the scope of this document. note: this may be undesirable for pages where technical commands are given (like this page) because the commands will also be translated. "RewriteEngine dessus" will almost certainly get you a 500 error page!

httpd.conf

Remember, if you put these rules in the main server conf file (usually httpd.conf) rather than an .htaccess file, you'll need to use ^/... ... instead of ^... ... at the beginning of the RewriteRule line, in other words, add a slash.

inheritance..

If you are creating rules in sub-folders of your site, you need to read this.

You'll remember how rules in top folders apply to all the folders inside those folders too. we call this "inheritance". normally this just works. but if you start creating other rules inside subfolders you will, in effect, obliterate the rules already applying to that folder due to inheritance, or "decendancy", if you prefer. not all the rules, just the ones applying to that subfolder. a wee demonstration..

Let's say I have a rule in my main /.htaccess which redirected requests for files ending .htm to their .php equivalent, just like the example at the top of this very page. now, if for any reason I need to add some rewrite rules to my /osx/.htaccess file, the .htm >> .php redirection will no longer work for the /osx/ subfolder, I'll need to reinsert it, but with a crucial difference..

this works fine, site-wide, in my main .htaccess file
# main (top-level) .htaccess file..
# requests to file.htm goto file.php
Options +FollowSymlinks
rewriteengine on
rewriterule ^(.*)\.htm$ http://corz.org/$1.php [r=301,nc]

Here's my updated /osx/.htaccess file, with the .htm >> .php redirection rule reinserted..

but I'll need to reinsert the rules for it to work in this sub-folder
# /osx/.htaccess file..
Options +FollowSymlinks
rewriteengine on
rewriterule some rule that I need here
rewriterule some other rule I need here
rewriterule ^(.*)\.htm$ http://corz.org/osx/$1.php [r=301,nc]

Spot the difference in the subfolder rule, highlighted in red. you must add the current path to the new rule. now it works again, and all the osx/ subfolders will be covered by the new rule. if you remember this, you can go replicating rewrite rules all over the place.


troubleshooting tips..

rewrite logging..

When things aren't working out as you'd expect, the first thing you need to do is enable rewrite logging. I'll assume you are testing these mod_rewrite directives on your development mirror, or similar setup, and can access the main httpd.conf file. If not, why not? Testing mod_rewrite rules on your live domain is just a wee bit on the silly side, no? Anyway, here goes, put somewhere at the foot of your http.conf file..

Expect large log files..
#
# ONLY FOR TESTING REWRITE RULES!!!!!
#
RewriteLog "/tmp/rewrite.log"
#RewriteLogLevel 9
RewriteLogLevel 5

Set the file location and logging level to suit your own requirements. If your rule is causing your Apache to loop, load the page, immediately hit your browser's "STOP" button, and then restart Apache. All within a couple of seconds. Your rewrite log will be full of all your diagnostic information, and your server will carry on as before.

Setting a value of 1 gets you almost no information, setting the log level to 9 gets you GIGABYTES! So you must remember to comment out these rules and restart Apache when you are finished because, not only will rewrite logging create space-eating files, it will seriously impact your web server's performance.

RewriteLogLevel 5 is very useful, I find.

Fatal Redirection

If you start messing around with 301 redirects [r=301], aka. "Permanently Redirected", and your rule isn't working, you could give yourself some serious headaches..

Once the browser has been redirected permanently to the wrong address, if you then go on to alter the wonky rule, your browser will still be redirected to the old address (because it's a browser thing), and you may even go on to fix, and then break the rule all over again without ever knowing it. Changes to 301 redirects can take a long time to show up in your browser.

Solution: restart your browser, or use a different one.

Better Solution: Use [r] instead of [r=301] while you are testing . When you are 100% certain the rule does exactly as it's expected to, then switch it to [r=301] for your live site.


conclusion

In short, mod_rewrite allows you to send browsers from anywhere to anywhere. You can create rules based not simply on the requested URL, but also on such things as IP address, browser agent (send old browsers to different pages, for instance), and even the time of day; the possibilities are practically limitless.

The ins-and outs of mod_rewrite syntax are topic for a much longer document than this, and if you fancy experimenting with more advanced rewriting rules, I urge you to check out the apache documentation.

If you have apache installed on your system, there will likely be a copy of the apache manual, right here, and the excellent mod_rewriting guide, lives right here. do check out the URL Rewriting Engine notes for the juicy syntax bits. That's where I got the cute quote for the top of the page, too.
;o)
(or

mod_write trick to Convert http to https

mod_write trick to Convert http to https

Found this tip on an IBM website. If you want to convert all http requests to https requests (useful for secure-ish sites), use the following snippet in your .htaccess file:


RewriteEngine on
RewriteCond %{SERVER_PORT} =80
RewriteRule ^(.*) https://%{SERVER_NAME}%{REQUEST_URI}

COUNTER